DOCUMENT RESUME ED 028 633 24 EM 000 317 By-Page, Ellis B.; Paulus, Dieter H. The Analysis of Essays by Computer. Final Report. Connecticut Univ., Storrs. Spons Agency-Office of Education (DHEW), Washington, D.C.Bureau of Research. Bureau No-BR-6-1318 Pub Date Apr 68 Contract OEC-16 -001318-1214 Note- 280p. . EDRS Price MF-S125 HC-S14.10 Descriptors-sComputer Assisted Instruction, Content Analysis,Educational Diagnosis, Educational Testing, *Essays, Essay Tests, Information Storage, InstructionalTechnology, Predictive Ability (Testing), Predictive Measurement, Psychometrics, *Simulation, Statistical Analysis,*Structural Analysis, Student Writing Models, Time Sharing. *Writing Skills Identifiers-PEG, *Project Essay Grade This study aimed at expanding a new field ofeducational measurement, by investigating the feasibility of using computer programsfor the automatic analysis and evaluation of student writing. Essays writtenby seCondary students in their English classes were rated by multiple independentjudges -on a number of traits usually considered important: content, organization,style, mechanics, creativity, and overall quality. The essays were key-punchedfor input to the computer. Computer programs were written toanalyze the essays, performing many te§ts andlist lookup procedures, and producing a profile ofproxes"(variables believedto be approximations of important dimensions of theessays). These proxes were then combined through multiple regression to optimizethe prediction of the expert judgments. Across various essays, judges, students,and traits,the computer performed about as accurately (in predicting the expertgroup) as did the typical human judge. Many other dimensions of the problem wereexamined, including the use of cliches, passive verbs, and syntactic parsing. A planof attack was Outlined for future investigators. (Author) Final Report

Project No. 6-1318 Contract No. OEC-16-001318-1214

Principal Investigator: Ellis B. Page

THE ANALYSISOF ESSAYS BYCOMPUTER

by

Ellis B. Page .

and t

_ Dieter H. Pdulus

The University ofdonnecticut Storrs 1.

April, 1968 4

U. S. DEPARTMENT OF HEALTH, EDUCATION, ANDWELFARE

Oifice.of tducation Bureau of Restdrch U.S. DEPARTMENT Of HEALTH, EDUCATION & WELFARE

OFFICE Of EDUCATION

THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVEDFROM THE

PERSON OR ORGANIZATION ORIGINATING IT.POINTS OF VIEW OR OPINIONS

STATED DO NOT NECESSARILY REPRESEK OFFICIAL OFFICE OFEDUCATION

POSITION OR POLICY.

Final Report

Project No. 6-1318 Contract No.OEC-16-001318-l2l4

Principal Investigator: Ellis B. Page

THE ANALYSIS OFESSAYS BY COMPUTER by

Ellis B. Page

and

Dieter H. Paulus

The University ofConnecticut Storrs

April, 1968

reported herein wasperformed pursuant to a The research of contract with theOffice of Education,U.S. Department Health, Education, andWelfare. Contractorsundertaking Government_sponsorship areencouraged such projects under judgment in the con- to expressfreely their professional Points of view oropinions stateddo duct of the project. of not, therefore,necessarily.representofficial Office Education position orpolicy.

U.S. DEPARTMENT OF HEALTH, EDUCATION, ANDWELFARE Office of Education Bureau of Research TABLE OF CONTENTS Page

TITLE PAGE ii TABLE OF CONTENTS

PREFACE BY THE PRINCIPALINVESTIGATOR

THE TEXT

1 CHAPTER I: INTRODUCTION Background 1 The Practical 3 The Input Question Background 4 The Theoretical 6 Related Research Background Disciplines 6 6 Psychometrics 7 Linguistics Curriculum 7 Automatic Language-DataProcessing 8 Statistical Methodology 8 Computer Technology 9 Objectives of theResearch 10 13 CHAPTER II: THE BASIC DESIGN 13 Rationale 19 CHAPTER III: THE INITIAL PROXES 19 Sampling 21 Hypotheses and Proxes 31 The Computer Program Preparation of the Text 38 39 Summary 40 CHAPTER IV: PREDICTING OVERALLQUALITY 40 Human Ratings 46 Overall Predictionof the Proxes Criterion 46 Correlation With-the 51 Multiple Regression 53 The Paulus Tables 59 Use of theTables 60 Reliability of Proxes Judgments 62 Human and Machine 63 "Validity" Human vs. Machine 65 Human vs. MachineAccuracy for AnotherEssay's Using One Essay's Proxes 66 Criterion

ii TABLE OF CONTENTS (Continued) Page CHAPTER IV (Continued) Cross-validation with Same Essays 69 Practical Implications 71 Summary 72

CHAPTER V: PREDICTING A PROFILE OF RATINGS 73 The Sample 73 The Rating Session 75 The Rating Criteria 76 Contribution of the Proxes 83 The Uniqueness of the Traits 92 Judge Viewpoints 96 Trait Prediction by Machines 100 Summary 105

CHAPTER VI: PROBLEMS OF STATISTICAL IMPROVEMENT IN PREDICTION 107 The Problem of Linearity 107 Interactions 109 Transformations 116 Discussion 122 Summary 125

CHAPTER VII: PHRASE LOOKUP AND ITS IMPLICATIONS 126 The Phrase Look-up Procedure 126 An Application to Cliches 130 Background oh Cliches 131 A Search for Psychological Characteristics 134 Correlative Conjunctions 139 Verb Constructions 141 Parenthetical Expressions 142 Summary 149

CHAPTER VIII: ON-LINE ANALYSIS AND FEEDBACK 150 Background 151 The Program 152 Summary 157

iii TABLE OF CONTENTS (Continued) Page CHAPTER IX: CONCLUSIONS AND IMPLICATIONS 163 Summary of Work Completed 164 Rationale 164 Findings 165 Some Work Related to the Project 169 Journals 169 Societies 169 Textbooks 170 Other Books 171 Recent Related Work 172 Parsing 173 Multiple Path System 173 Discourse Analysis 185 Transformational Grammar 186 Semantics 186 Future Work in Essay Analysis 189 Need for Flexibility 189 Grading of Content 191 Further Analysis of Style 194 Hypothetical Complete Essay Analyzer 194,

APPENDICES

APPENDIX A. Source Program for EssayAnalysis 199

APPENDIX B. Statistical Tables for Multiple Regression 243 Table IV-11 (A) 243 Table IV-11 (B) 244 Table IV-11 (C) 245 Table IV-11 (D) 246 Table IV-ll (E) 247 Table IV-11 (F) 248 Table IV-11 (G) 249

APPENDIX C. Source Listing for PHRASE 250

APPENDIX D. PL/I Source Program for PARSE 252

REFERENCES 264

iv TABLES Page TABLE IV-1 HUMAN JUDGE AND JUDGE PAIR CORRELA- TIONS FOR ESSAY C QUALITY WITH PREDICTIONS FROM ESSAY D 42 TABLE IV-2 MEANS AND STANDARD DEVIATIONS OF THE PROX SCORES 44 TABLE IV-3 PROXES USED TO PREDICT A CRITERION OF OVERALL QUALITY (ESSAY D) 45 TABLE IV-4 THE DIRECTION OF CORRELATIONS OF PROXES WITH THE CRITERION: PREDICTED AND OBSERVED FREQUENCIES 47 TABLE IV-5 INTERCORRELATIONS OF THE PROXES FOR ESSAY D 48 TABLE IV-6 INTERCORRELATIONS OF THE PROXES FOR ESSAY C 49 TABLE IV-7 PROXES USED TO PREDICT A CRITERION OF OVERALL QUALITY (ESSAY C) 52 TABLE IV-8 MINIMUM MULTIPLE CORRELATIONS REQUIRED FOR SIGNIFICANCE AT THE .05 LEVEL 54 TABLE IV-9 MINIMUM MULTIPLE CORRELATIONS REQUIRED FOR SIGNIFICANCE AT THE .01 LEVEL 55 TABLE IV-10 MULTIPLE REGRESSION COEFFICIENTS CORRECTED FOR ATTENUATION CAUSED BY CRITERION UNRELIABILITY 57 TABLE IV-12 ESSAY D PROXES USED TO PREDICT AN ESSAY C CRITERION 68 TABLE IV-13 CROSS-VALIDATION COMPARISON OF THE COMPUTER WITH FOUR HUMAN JUDGES 70 TABLE V-1 INTERCORRELATION MATRIX FOR 32 RATERS FOR TOTAL SCORES 81 TABLE V-2 MEANS AND STANDARD DEVIATIONS OF THE FIVE TRAITS AND THEIR AVERAGE 82 TABLE V-3 PROX CONTRIBUTION TO THE PREDICTION OF IDEAS OR CONTENT 84 TABLE V-4 PROX CONTRIBUTION TO THE PREDICTION OF ORGANIZATION 85 TABLE V-5 PROX CONTRIBUTIONTOTHEPREDICTION OF STYLE 86 TABLE V-6 PROX CONTRIBUTIONTOTHEPREDICTION OF MECHANICS 87 TABLE V-7 PROX CONTRIBUTIONTO THE PREDICTIONOF CREATIVITY 88 TABLES (Continued)

TABLE V-8 PROX CONTRIBUTION TO THE PREDICTION OF AVERAGED RATING ACROSS 5 TRAITS TABLE V-9 INTERCORRELATIONS OF THE TRAITS OF JUDGED ESSAYS 93 TABLE V-10 TRAIT BY ESSAY INTERACTION 95 TABLE V-11 COMPUTER SIMULATION OF HUMAN JUDGMENTS FOR FIVE ESSAY TRAITS 103 TABLE VI-1 RANK-ORDERING OF PREDICTOR VARIABLES 114 TABLE VI-2 MULTIPLE REGRESSION EQUATION FOR LINEAR TERMS AND INTERACTIONS 115 TABLE VII-1 TRITE PHRASES FOUND IN HIGH SCHOOL ESSAYS 133 TABLE VII-2 CORRELATION OF FIVE MAJOR TRINS WITH "OPINIONATION," "VAGUENESS," AND "SPECIFICITY" 136 TABLE VII-3 CORRELATIONS OF VOCABULARY MEASURES WITH "OPINIONATION," "VAGUENESS," AND "SPECIFICITY" 138 TABLE VII-4 DISCOVERED FREQUENCIES OF CORRELATIVE CONJUNCTIONS 140 TABLE VII-5 PARENTHETICAL EXPRESSIONS USED 145 TABLE VII-6 MEAN GRADE DIFFERENCES BETWEEN THE PARENTHETICAL AND NON-PARENTHETICAL GROUPS 146 TABLE VII-7 CORRELATIONS BETWEEN POSITIONS OF PARENTHETICAL EXPRESSIONS AND STYLE 147 TABLE IX-1 THE RELATION OF COMPUTER PARSING TO JUDGED GRAMMATICALNESS OF STUDENT SENTENCES 182 TABLE IX-2 MACHINE PARSING PERFORMANCE OF GRAMMATI- CAL AND UNGRAMMATICAL STUDENTSENTENCES 184

vi FIGURES Page FIGUREII-1: POSSIBLE DIMENSIONS OF ESSAY GRADING 14 FIGUREIII-1: GENERAL FLOW CHART FOR FIRST PROGRAM PROJECT ESSAY GRADE (ESSAY ANALYSIS) 34 FIGUREV-1: CRITERIA FOR RATING THE ESSAYS 78 FIGUREVI-1: 108 FIGUREVI-2: VARIABLENUMBER 8 117 FIGUREVI-3: VARIABLENUMBER 15 118 FIGUREVI-4: VARIABLENUMBER 22 119 FIGUREVI-5: VARIABLENUMBER 23 120 FIGUREVI-6: VARIABLENUMBER 29 121 FIGUREVI-7: SAMPLE OUTPUT FROM TRANSFORMATION PROGRAM 123 FIGUREVIII71:SAMPLE COMPUTER OUTPUT 158 FIGUREIX-1: COMPUTER LISTING OF HOMOGRAPHS FROM THE PARSING DICTIONARY FOR A STUDENT SENTENCE174 FIGUREIX-2: FIRST COMPUTER ANALYSIS OF SYNTAX OF A STUDENT SENTENCE 176 FIGUREIX-3: LATER COMPUTER ANALYSIS OF SYNTAX OF A STUDENT SENTENCE 178 FIGUREIX-4: STATISTICAL INFORMATION PRODUCED BY THE MULTIPLE PATH SYNTACTIC ANALYZER (SYNTAX DIAGNOSIS) 180 FIGURE IX-5: FURTHER STATISTICAL INFORMATION PRODUCED BY THE MULTIPLE PATH SYNTACTIC ANALYZER (SYNTAX SUMMARY) 181 FIGURE IX-6: HYPOTHETICAL COMPLETE ESSAY ANALYZER 195

vii PREFACE BY THE PRINCIPAL INVESTIGATOR

This document, The Analysis of Essays by Computer, is primarily intended as the Final Report for the United States Office of Education, for a research contract which supported us during 1966 and 1967. Yet it also represents the first summary statement of all of the work undertaken since early 1965 at the University of Connecticut in such essay analysis, and in the simulation of human rating behavior.

It is difficult to trace the genealogy of any idea, let alone one as interdisciplinary as that underlying the present work. The notion of computer analysis of essays began to seem conceivable, following an invitational con- ference on data banks, led by John B. Carroll at Harvard University in December, 1964. My own experience had in- cluded work in many of the contributing fields, so that the manipulation of language, as described by Philip Stone and others there, drew together many threads into an eventually engrossing central problem.

From the moment of conception, this work has owed much gratitude to a succession of able and helpful people. J. A. Davis was immediately encouraging, as were Allan B. Ellis, William Asher, Dexter Dunphy, and Marshall Smith. John Duggan and John Valentine, of the College Entrance Examination Board, helped greatly in arranging almost immediate financial support. All that we did then and later owed much to this prompt generosity of the CEEB, and this report will also serve as the most unified summat'on of the earliest work done under that support.

Other generous support, supplementary to that of the U.S. Office of Education, has been given by National Science Foundation, through its partial funding of the University of Connecticut Computer Center. Furthermore,

viii very helpful the MassachusettsInstitute of Technology was in supporting me as NewEngland VisitingScientist to their Computation Center during1966-67. Finally, the University aid at of Connecticut ResearchCouncil has given prompt crucial times.

It would beimpossible to list everyonewho has been helpful with thisProject, and there are sureto be impor- tant and unintentionalomissions. Here at Connecticut, many ideas wereearly discussed withHerbert Garber, then with us in the Bureauof EducationalResearch, with Arthur Kenneth G. Wilson. Daigon, with CharlesMcLaughlin, and with These have all served asconsultants for brief orlonger periods of time, and manyhave contributedideas or in- this report, are not sights which, becauseof the nature of the acknowledged explicitlyin the text. From the start, Gerald and Mary Ann Project had, asprincipal programmers, and, for the Fisher. Mr. Fisher hasbeen a consultant The programs year 1967-68, aResearch Associatewith us. importance from this employmenthave plainly beenof central to the work. of In mid-1966Dieter H. Paulus joined the Bureau contributed richly Educational Research,and has in many ways contributions to the worksince that time. His various author of are mentionedoften in the textand he is second this report and partnerin the on-going work.

Others who helpedhere in the Bureauof Educational willing Research were MissLouise Patros,together with her Haddad, and Mrs. staff of Mrs. HelenRing, Miss Evelyn is owed Katherine Showalter. To Miss Patrosmuch gratitude to a large for office managementfunctions so important research, and to all we aregrateful for thepreparation of carried this manuscript. Some of theresearch detail was Their names out by graduatestudents here inthe Bureau. contributions, are mentionedin the text,together with their

ix wherever these are included in the report. Among these, Donald Marcotte made contributions which were clearly out- standing.

During the work we have consulted many scholars from other institutions, formally or informally, and some of them should surely be listed here: Walter and Sally Y. Sedelow, Robert Stake, Paul Lohnes, Carl Helm, Arthur Jensen, Paul Diederich, Ross Quillian and Daniel Bobrow, Marvin Minsky, Arthur Anger, Bruce Ressler, John Moyne and David Loveman, Leslie McLean, William Cooley, John Carroll, Larry Wightman, Stanley Petrick and Jay Keyser.William McColly early provided us with the original data and worth- while ideas. And Julian C. Stanley has served as a con- stant source of encouragement and inspiration.

Those readers seeking a shorter and more general introduction to this project are directed to the various publications by the workers, listed in the References. For a summary of this writing,theymay wish to read the first section of Chapter IX of this report.

Ellis B. Page Storrs, Connecticut CHAPTER I

INTRODUCTION

When this research was proposed, thetime surely seemed ripe for a much expanded study ofcomputer analysis of student essays. In recent years rapid strides had been made in computer hardware technology,in the program- ming of language-data processing, andin linguistic analysis. More was known than formerly aboutthe simula- tion of cognitive products and relatedfields. Many of the building blocks, therefore, appearedto be in place or nearly so. What remained was to thrust forwardinto the applied and basic problems of essayanalysis and grading.

This study, therefore, aimed atadvancing the know- ledge of automatic essay analysis asfar as theory, practice, and facilities would permitwithin the rather narrow span of time permitted. And this report will explain what was designed, attempted, and accomplishedduring this study period in this very new and potentiallyimportant field of research. It will also set forth currentunderstandings about the most profitable avenuesfor further research.

And this first chapter willexplain the background for the problem, both practical andtheoretical, as well as the specificnature of the research attempted.

(A) The practical background. The practical problems of "objective" grading have longtroubled education and the field of psychometrics generally. A single judgment of an essay by asingle human judge is slow,extremely unreliable, and of uncertain status. When sufficient trainingis used, and a sufficient number ofjudgments establish a decent reliability, essay grading becomesprohibitively expensive. Psychometricians have therefore settledfor multiple-choice

-1- items. These have the virtues of wide sampling,since more questions may be askedwithin a given time period; of high reliability; and of defensible validity, since scores often correlate as highly with judgmental ratings asthe ratings correlate with each other under ordinarycondi- tions. Nevertheless, educators are far from contentwith multiple-choice examinations as the ultimatecriterion of achievement. They wish to call upon students forglobal, organized responses concerning largequestions in substan- tive fields. They would like to ask, in testingself- expression, for direct demonstration of correntand literate usage. They are often not satisfied by thestatistical evidence because of inadequate understandingof this evidence, and their incomprehension poses a problemfor the psycho- metrician. More importantly, two objections tomultiple- choice testing cannot be refuted comfortably atthe present time: (1) One virtue of any test is thepractice which the testing session gives the student.And it seems clear that the practice experiences of thestudent in taking an essay test are not precisely the same asin taking a- multiple- choice test. (2) Another virtue of any test is thetype of study which its anticipationmotivates in the student before the test is administered. Many persons believe that students study differentlyfor an essay test than for a multiple-choice test,differently for "recall" items than for "recognition" items. Clearer evidence on these two objections is needed, buttheir present status supports the desirability of finding somefast, reliable, inexpensive, and "objective" system of essaygrading.

In English instructionespecially, we have an example of a troubled field for essayanalysis. Many believe that students need far morepractice in writing essaysin elementary and high school years. Yet writing without feed- back seems generally pointless,and is surely objected to -2-

- by the students concerned. And the feedback is very diffi- cult to systematize. To do the ideal job in essay analysis, the high school English teacher would have to spend tremend- ous amounts of time out of class. Equalizing the load of the English teacher with his colleagues in other subjects is an unsolved problem. "Lay readers" are tried on an experimental basis in a number of schools, but these are an additional expense, are relatively untrained, and pose some large problems of coordination and aptness of judgment. Furthermore, the supply of aualified and interested English teachers has always been too limited. It is hoped that some way might be found to employ more broadly the talents of the few, so that individual judgment and correction of essays might be disseminated in the same way as lectures may be filmed or exercises may be printed in textbooks. A proper program for correction of essays would therefore be an attempt to amplify the effectiveness of the moreintelli- gent and talented of graders and correcters. This study therefore aimed at the type of essay analysis most character- istic of English classes.

The input question. To solve any of these general practical problems would of course require practical input and output. At present, no computer does an adequate job of reading ordinary printing or typing, let along ordinary handwriting, into correct card images for further data pro- cessing and analysis. Yet rapid strides are being made in such recognition, and one may hope for resolution of input problems before the judgmental problems are completely sat- isfied. The computerized optical reading of standard type- script may be only a very few years away.Or, for that matter, the gradual replacement of much of student hand- writing in the schools by inexpensive and noiseless char- acter printers (perhaps related to the present Stenotype machines) seems a plausible and perhaps early development. But even with the present necessity of key-punching IBM -3- cards from student copy,practical input for computer grading is not wholly out ofthe question. For example, the cost of suchkey-punching ranges below$2.00 per essay. Such an input cost,while out of the questionfor daily classroom routine, would notbe unreasonable for an occa- sional master analysis,serving as a basic forextensive descriptive or prescriptivereporting, for screening or placement, or for certainother types of evaluation or guidance activity. Indeed, presentobjective-test batteries often cost much morethan that. For the purposes ofthis study, however, it wasassumed that input hadbeen trans- formed into punchedcards or card images,and concentra- tion was on thecorrection and evaluationproblems them- selves.

(B) The theoretical background. The rather momentous practical consequences ofcomputerized essay gradingwill be some years away. Before these are felt,there were theoretical questionsimportant to the study,and there study. aretheoretical answers which maybe furnished by the Psycho- These were psychologicaland linguistic in nature. logically, for example,what roles do theactual various prosecharacteristics play in thecognitive and effective rating processes?Actual manipulationof prose character- istics is not anticipatedin the presentdesign, and therefore direct causalrelationships will not beinfer- rable, but someimportant implicationsfor these processes into some fruitful may turnpsychological experimentation channels. As a linguisticexample, there is theadditional understanding which may begained of the natureof prose there are description. As Francis(1958) has pointed out, several kinds of "grammar": among themthe prescriptive descrip- grammar, or"etiquette," of theschools, and the (Also tive grammar characteristicof modern linguistics. -4- see "What Grammar?" by Gleason,1964). It may be noted that computer analysis of this proposed kindproduces still another sort: a set of descriptionsresulting from the computer's own peculiar limitationsand abilities. A list of prepositions may beemployed, for example, and any match with thislist may cause a counter to be incre- mented. In such a program, some words will becounted which the competent human judge would classifyin other ways: as adverb, subordinatingconjunction, coordinating conjunction, etc. Yet from this NPREP count mayresult a description which would be impractical for humanjudgment, which is 100% reliable within the essay,which probably has high reliability across essays ofthe student, and which may be useful in predicting thequalitative human judgments of the essays. Furthermore, it was intended to use certainextant computer analyzers from otherresearches, and this was done. These are efforts to perform linguisticanalysis within the sentence, and they areinevitably limited in accuracy. The limitation in accuracy need notbe a handi- cap, however, in terms ofuseful theoretical and practical description. The important point here is that thecomputer may pro- vide new measurements of language usageand these will have inevitable importance for theorybuilding and basic discovery. These measurements do notpresently carry heavy theoretical freight, onlybecause they have not been observable within the traditionaltechnology. (See later discussion on this point.)

More will be said in thefinal chapter about theoreti- cal outlooks for such research. It is enough here to note that both practical and theoreticalinterests motivated the present study.

5- Related Research

The field of essay evaluation by computer represents a new focus within the (also new) field of computational linguistics, just as it represents a new and divergent speciality within educational measurement and educational technology. Like all promising new areas of scholarly investigation, however, it must draw heavily upon some combination of background disciplines not ordinarily con- sidered together. This section on related research will consider some materials from these background disciplines.

(a) Background disciplines

(1) Psychometrics is a basic discipline within which any system of evaluation must be justified. The discipline already has achieved many technical skills (assessment of various forms of reliability and validity) necessary to proceeding with the study at hand. Some of the particular psychometric problems in content analysis are discussed in work by Dexter Dunphy (in Stone, 1966). Important back- ground work dealing with the reliability of essay grading by human judges has been done by Diederich, French, and Carlton (1961), by Myers, McConville, and Coffman (1963), and by McColly and Remsted (1963), to name only three out- standing recent examples. In recent years essay testing has apparently seemed so unprofitable to psychometricians that it has been almost wholly neglected. For example, the index of a recent Review of Educational Research about testing had only one item referring to essay testing and it is negative: "problems of unreliability in grading" (Merwin and Gardner, 1962).

-6- (2) Linguistics has potentially very high relevance to computer analysis of essayexaminations. Important lines of study have of course emergedfrom the "generative grammar" thinking of Chomsky (1957) and others(e.g., Miller, 1962; Postal, 1964). The implications of some of these more scientific approaches to linguisticsfor a broader psychology of language have beenrecognized by Carroll (1964) and others. Of course, the particular newerfield of this discip- line known as computationallinguistics is more intimately related to the present phases ofthis work. And this field in turn has a large overlap with thefield,of list-processing (see below), and of informationretrieval. Many of the most effective workers in thesefields come not directly from linguistics training, butfrom mathematics, psychology, and computer science.

(3) Curriculum. Curriculum, in all fields using essay examinations,is a concern of central relevanceto the study. This is especially true of languagearts educa- tion, where there are tensions(Gleason, 1964) between the modern descriptive linguist andthe traditional "prescrip- tive" grammarian (such as Hodges,1951, or Warriner, 1951), and what should be taught incomposition is by no means certain (Marksheffel, 1964). Eventually, decisions must be made about the "right"approaches for any computerized master analysis. But for a problem ofoptimization of simulation of human ratings, hypothesesfrom both camps appear useful, and maybe empirically checkedagainst the criterion. And some interesting lighthas been cast on certain questions of the "etiquette" grammarby work al- ready done with this project.

Although the language artscurriculum is especially important, it is by no means unique. Within the present research design, the study shouldproduce some interesting

-7- information for curriculum within other keydisciplines (see the procedures), especiallyregarding the importance cf special vocabulary.

(4) Automatic language-data processing hasbeen well described by a number of writers(Green, 1963, ch. 13; Borko, 1962, pp. 336-423), but oneof the best general accounts is by Garvin andothers (1963). In general, there appear two majormethods which are possible: one is the content-analytic approach, like that usedin the "General Inquirer", (Stone, et al, 1966) andis more a "statistical" method; the other is more orientedto syntactic and seman- tic relationships, as are necessaryto the machine-trans- lation studies underway, and maybe considered a more "linguistic" method. Both appear promising for essaygrad- ing. Of particular potential help appearto be certain grammatical-classification computer programsalready de- vised: a part-of-speechdecider which is about 95% accu- rate (Stolz, Tannenbaum,and Carstensen, 1965?), and a dependency classifier (Klein andSimmons, 1963), which lists the various different structurespossible for a given sen- tence. Especially significant are two systemsalready tried with small subsamples of ourdata, programs by Kuno (1964), and by John Moyne of theIBM Boston Programming Center.

(5) Statistical methodology is likepsychometrics in having a great body ofwell-developed doctrineand practice which may be brought tobear on the presentproblem. An optimization solution maybe sought with somestandard statistical techniques such as_multiple regression(e.g., Cooley and Lohnes, 1962); orin some sequential,decision- making form, such as an operationsflow with a seriesof choice points (cf. Simon,1964); or in somecombination of lead the two. The verbal protocols ofhuman raters might eventually to some appropriatecombination. (6) Computer technology is veryimportant in both hardware and programming. Advances in machinedesign, especially in larger memoriesand reduced costs, willmake feasible the more complexgrading programs at moreeconomi- cal levels. But present equipmentis adequate for exten- sive exploration of theproblem.

Great strides havealso been taken indesigning software suitable forlanguage processing. List-processing third-level computer languages areespecially appropriate, and at least three havebeen written which areextensions of the FORTRAN framework: IPL-V, SLIP (Weizenbaum,1963), and DYSTAL (Sakoda, 1964). Another important list proces- sing language is COMIT(Yngve, 1962a, 1962b),designed for such work as machinetranslation. A modification of COMIT has been made byStone (1964) and hisassociates for the "General Inquirer"system at Harvard.(After consider- able investigation of computerlanguages, the present program- ming was, except forminor subroutines, entirelydone in FORTRAN IV. This decision makespossible maximum versatility, availability of programmers,and dissemination ofprograms.) Two new developmentsin software promiseincreased ease of programming within AEC. One of these is STUFF(Puckett, 1966), which provides forstring-manipulating functions embedded in FORTRAN IV. The other is in PL/Ilist-pro- cessing (Lawson, 1967),which is promised in anearly imple- mentation of the IBM 360series (which has beeninstalled at the University ofConnecticut in August,1967).

One of the presentlines of work in thefield is that 1963; Stone, et al of the General Inquirer(Stone and Hunt, For cer- 1966; Ellis, 1964; Ogilvie,Dunphy, et al, 1962). tain purposes, a shortdictionary of under4,000 root words has accounted for 90-98%of the ordinarywritten languages analyzed by General Inquirer(Dexter Dunphy andMarshall Smith, personal conferencewith the investigatorsDecember procedures are 22 in Cambridge, Mass.).Dictionary lookup crucial to language-processing, and recent developments of IBM research promise speeds of dictionary reference up to 10,000 words per minute (Philip Stone, 1964). As mentioned elsewhere in our proposal, studies by Simmons and others at System Development Corporation, by Stolz and others at Wisconsin, and by Kuno at Harvard have made progress in relevant software development.

Still another major line of automatic language-proces- sing appears to be the movement toward what may best be called "computational humanism," especially concerned with data processing to solve the kinds of problems (concordances, attribution, influence, style) usually associated with literary scholarship. This movement is rapidly gather3.ng momentum with conferences, workshops and institutes, and a beginning literature, such as the recent book by Bowles (1967), or the emerging journal, Computer Studies in the Humanities and Verbal Behavior, now being printed by Mouton Press, of the Hague.

These six fields, then, contribute to the background expertise which is producing a new and potentially useful sub-discipline within educational research. The analysis of essays by computer is seen to be based upon a number of other disciplines, some going back into the nineteenth century, but others part of the general growth of behavioral science and computer technology within the last several decades.

Obiectives of the Research

In general, the objectives of the present study did not lend themselves to the clear, Fisherian, "classical" experimental designs, because not all operations could be fcreseen. It did, however, permit clear procedures of dynamic development and exploration at each stage of the study, and clear verification of accomplishment at the end.

-10- Properly understood, these characteristicsare not handi- caps, but symptoms of large research scale. In a recent paper, Baker (1965) pointed out that the larger and more exploratory research project "must be inherently dynamic and possess the ability to change its internalstructure without sacrificing the rigor of the design" (p.15). And another writer (Doyle, 1965) has recently stated that as a study approaches the "basic research end of the spectrum, it becomes more and more imperative to be free to alter the plan. Indeed, in basic research altering the plan ought to be a state of mind." With the present work, it would be mistaken andeven misleading to commit the investigation prematurely to toonarrow a path.

In general terms, the objectives of the present study were as follows:

(1) To identify important characteristics of student prose which are analyzable through specially devised com- puter programs. These characteristics were to be aimed especially at predicting human judgments of content, organi- zation, style, mechanics, and overall quality. (2) To develop computer programs for measurement of these qualities, or variables related to them,as they occur in school essays. (3) To analyze the* computer-generated objective data in relation to subjective measures of theessay dimensions, in order to improve the differential accuracy of evaluating such essay dimensions. (4) To develop through this procedure greater under- standing of the human rating process, as applied to objec- tively describable prose characteristics. (5) To study those aspects of essay description which appear most promising for useful feedback tL the teachers and students. In other words, to begin exploration of the feasibility of computer commentary about student essays. (6) To set forth largerstrategies for the most promising future explorationof computer gradingof essays.

This report tells about thepursuit of these objectives, in the followingchapters. CHAPTER II

THE BASIC DESIGN

Some fundamentalstrategies ofinvestigation were desigle'l early in1965, and employedin the first data primarily runs ofProject Essay Grade(PEG I), financed But that study by the CollegeEntrance ExaminationBoard. and merged wasintimately involvedwith the present one, reporting of research into it, andcompletely separate done under the two sourcesof support woulddo some in- there justice to thiscontinuity. Furthermore, although professional has been muchreporting of all ofthis work in popular publications, atscientific meetings,and in more disseminable technical news media,there has not been a will at least report of any ofit. Thus this report touch upon all ofthe work to date.

Rationale

We should beginwith a generalrationale concerning This presentation seems the computergrading of essays. analysis of necessary fortwo reasons: (1) The computer and is nottreated essays seemsto some aradical proposal, The investiga- elsewhere inpsychometric literature. (2) larger explora- tors intend thepresent projectto open a possibilities tion of suchmeasurement andfeedback, with not at alllimited to the presentwork. to be atleast two In general,then, there appear grading, with twogeneral dimensions of theproblem of essay place, there is approaches in eachdimension. In the first interested in what the content vs.style dimension. Are we discovery ofAmerica by the student says(e.g., about the of punc- Columbus), or in the Kayhe says it(e.g., his use categories are notmutually tuation)? We all knowthat these for our first exclusive, but they areuseful concepts orientation (Page, 1966). -13- In the second place,there is the dimensionof rating simulation vs. master analysis. Are we interestedin an actuarial approximation ofthe ratings of humanjudges (e.g., in certain wordsstatistically associatedwith high ratings, even though notthemselves regarded as anindex of correct expression)? If so, we areessentially inter- ested in ratingsimulation. Or are we interestedin the computer doing a"reading" of languageand performing a kind of informed andrational "judgment"? If so, we are speaking of the computer asmaster analyst,.andof creating a kind of"artificial intelligence." These two dimensions are picturedin Figure II-1.

II Content Style

A. I-A II-A Rating Simulation

B. I-B II-B Master Analysis

Figure II-1 Possible Dimensionsof Essay Grading

Clearly the columnsof Figure II-1 arenot going to remain unrelated toeach other, sincein some ways content headings given and style areinseparable. And the column example, is are notcompletely satisfactory. Spelling, for does not appear a aconsideration in Column II,yet "style" satisfying rubric forthe marking ofspelling errors. Similarly, Rows A and Bwill not remainunrelated either. which As theinvestigation of simulationdiscovers variables accurately correlatedwith are,empirically, more and more -14- human ratings, the analysiswill become more profoundand will grow closed to the"meaning" analysis eventually necessary in Row B. The top row, then, suggeststhe "actuarial approximation" tojudging the essay, and the bottom row represents the"master analysis" of the essay itself. These rows represent mattersof computer strategy and objectives. These rows need furthereXplanation, because they are very near theheart of the problem, hence arecrucial to understanding our progress todate in Project EssayGrade. What we have taken as ourfirst goal is the imitation, or simulation, of groups of expertjudges. How we reach this goal of successfulimitation is not the centralquestion, so long asit is reached, and so long as we canactually match or surpass the humanjudge in accuracy and inuseful- ness. In attacking the problemin this way we are clearly not doing a "mastetanalysis" or generating measuresof what the true characteristicsof the essays are, asordinarily discussed by human raters. Rather, we are content tosettle for the correlates ofthese true characteristics. To express thisimportant distinction, wehave been forced to coin twowords: trin and prox. A trin is the intrinsic variable of realinterest to us. For example, we may beinterested in a student's"aptness of wordchoice," variable or "diction." A prox, on theother hand, is some which it is hoped willapproximate the variableof true diction interest. For example, thestudent with better will probably be thestudent who uses a less commonvocabu- directly the lary. At present, the computercannot measure semantic aptness ofexpression in context, cr"diction." But it can discover theproportion of words not on acommon word list, and thisproportion may be a REREfor the trin of diction. in the Or anotherillustration: We may be interested complexity of a student'ssentences, in thebranching or

-15- dependency structures which he hasthe maturity toemploy. Such sentence complexity world,therefore, be a trin. But the sentence-parsing progra_sfor computers whichexist now are notcompletely satisfactory for our purposes. We might therefore hypothesizethat the proportion ofpreposi- tions, or of subordinatingconjunctions, constitute a prox for such complexity. And we might thereforeemploy this proportion, too, in our computeranalysis. One more essential,and the basic strategyof our first essay gradingproject may be understood: We have begun by saying that the basicevaluation of overall essayquality must be human. But-which human? If only one expertEnglish teacher grades an essay, weknow that the judgmentwill not be very dependable. We know that otherjudges will reach a somewhatdifferent conclusion, and eventhe same judge, if he were grading itagain, would probablyshift his eval- uation. The typical inter-judgeagreement is represented by a correlationcoefficient of only about.50. On the other hand, when a group ofindependent experts havegraded an has essay, andwhen these grades areaveraged, this average for a rapidlyimproving dependallility. When four judges, example, grade an essayindependently, their averagejudgment will correlate with the averageof four other judgesabout judgment of .80. So it is possible toget reliable human essay quality. But it is extremely,prohibitively expensive and time-consumingwhen applied to anylarge-scale testing. However, getting areliable human judgmentis not too expensive for a sample of essays. If we can find a wayto imitate, then, what theexpert human judgesdo with this sample, and if we applythis strategy to acomputer program for a huge number ofother essays, we capturehigh quality used to analyze of judgment at low cost. And the techniques the judgment and reproduceit are essentiallythose already so welldeveloped in standardprediction problems.

-16- The strategy, then, is very generalindeed: if the computer may be programmed tosimulate some sample, the re- sulting algorithm may be employed onarbitrarily large numbers of essays drawn from the samepopulation as the sample. The validity of any evaluationand analysis will then depend on basic conditionswhich are already very familiar, from measurement work, tothe psychometrician: on the number ofjudges used to establishcriterion evalua- tions; on their quality; on the"set" of the judges; on the number of essays evaluated; onthe nature of the essay sampling; on the frequency andconsistency of the proxes; and so on. And powerful, well-understoodstatistical tools may be brought tobear on the simulation. One technique for suchsimulation, where the appro- priate weighting of each proxis unknown beforehand, would be the familiar multipleregression, in which one cri4rion variable (in this case the humanjudgment, or trin) may be optimally predicted by adiscovered weighting of anumber of predictors (in this case,the computer proxes). And indeed, this general tool ofmultiple regression, implemented by appropriate computer programs,has proved very powerful for essay grading, both inthe initial strategiesand in the later ones. To summarize the generaldesign, then: (1) Essays to be evaluated must(at present) be key punchedfor computer input. (2) These essays must beindependently evaluated by human judges (of anydesired characteristics), onvarious traits (depending on theresearch hypotheses). (3) Hypothe- the ses must begenerated by other humanexperts, concerning programming of appropriate proxesfor evaluation. (4) These hypotheses, depending onconvenience and promise,must be programmed into the computeranalysis. (5) The machine- readable essays are passedthrough the computer,and the then proxes recordedfor each essay. (6) These proxes are optimized for the best possibleprediction of the pooled human judgments. -17- The flexibility of the general design is clear. It allows for any appropriate selection of judges, anyselection of proxes, of traits to be predicted, of essays, etc. Thus, this design has a great capacity for repeated use as our knowledge of essay grading broadens and deepens,and as its concerns expand to include all parts of theuniverse of Figure II-1. In this study, the attention firstfocused on simulation of ratings of overall quality of style. Then the concentra- tion shifted to ratings of various essaycharacteristics (content, organization, style, mechanics, andcreativity). A variety of subproblems wereconsidered, and hypotheses tested, and phrase-recognition procedures wereimplemented. And currently, attention is expanding toinclude subject- matter knowledge exhibited, and moreintensive linguistic strategies. But the basic design is easily. adaptedto these and other shifts of focus, as researchinterests become more sophisticated, and exhibit greater breadth anddepth. In- deed, even with the advanced strategiesprojected in the final chapter of this report, it isdifficult to imagine a time when such actuarial strategies will notconstitute an important part of some final decision process. CHAPTER III

THE INITIAL PROXES

This chapter will describe more ofthe fundamental thinking to date about computeranalysis of essays at the University of.Connecticut. First this report will con- sider the 1965 work, whichpredicted judgments of the over- all writing quality of a setof essays, and second the later expanded work, predicting a morecomplete profile of judgments on a number of essaycharacteristics or traits. This particular chapter will beconcerned with the sampling, procedures, proxes, and programs devisedfor such analysis.

Sampling. The basic research design has beendescribed in Chapter II. Since there was great flexibilitypermitted in selection of essays, andsince the investigators were eager to explorethe parametersof this field, asearch was conducted for essays which wouldhave certain desired characteristics. What seemed desirable were essayswhich (1) were already written undercarefully described circum- stances; (2) had ratings bymultiple human expertsalready assigned, independently of oneanother; (3) were drawn from a student populationheterogeneous enough tofurnish a reasonablereliability for rating sums;(4) were long enough to furnish stable measurementsof at least some prose characteristics;(5) were multiple foreach student, so that someestimate could be made oftest-retest relia- bility; (6) were general enough sothat findings mighthave fairly wide applicability; (7) wereaccompanied by correla- tive information about the students;(8) were representative of a random sample of the targetstudent population;(9) were large in number.

-19- A sample of essays fulfilling most ofthese require- ments was obtained in 1965 through WilliamMcColly, then of State University of New York, Oswego. For an earlier experiment in composition teaching, McColly andRemstad (1963) had arranged for English classes atWisconsin High Schgol (Madison) to write four essays, on fourdifferent topics, about one month apart. These had been indeed (1) written under carefully describedcircumstances; (2) given four independent ratings for"overall writing quality";(3) drawn from a heterogeneous studentpopulation, representing grades eight through twelve,with an average IQ of about 114;(4) of an average length of over 300 words;(5) four in number for each student;(6) written on rather commonthemes, such as whether the "best things in life were really free", or whether"anger" could have good uses; and (7) accompanied byfairly extensive informa- tion about the student writers. Since they were from one (rather atypical) high school, theycould not be said to represent a random sample from thesecondary population of the United States. On the other hand, for such anexplor- atory research, the proposedexperimental analyses were so broad that subtle interactions withability levels, or with other levels of student population, werebelieved of small initial concern. Finally, the number of the essays was substantial, with well over 250 essaysfor each of the four writing sessions. For multivariate analysisespecially, large numbers of cases are veryimportant.

The question of interjudgereliability is of great importance, since any optimizationtechnique, such as multiple regression, must have adecently reliable criterion if it is to produce any nonrandomresults. The overall ratings assigned by the Wisconsinjudges had an average interperson agreement of about .5,and an analysis-of- variance reliability for four suchjudgments pooled of around .83 (McColly and Remstad, 1963,p.49 ). This high

-20- suffi- a reliabilitywould give the sums(or averages) a cient stability for use as acriterion.

Hypotheses and proxes. Having defined thecriterion and established asuitable sample, the nextimportant task was todetermine what hypotheses wereappropriate, i.e., which of the availablehypotheses could beshaped into suitable algorithms toprovide proxes for themultiple ideal if we could regression. Clearly, it would have been nearly have incorporatedinto a massive computer program the whole of standardtexts on usage andrhetoric, such as the HarbraceHandbook (Hodges, 1951). That is, in one sense, stillthe target of such work,but no one dreamed that anythingapproaching such a goalcould be implemented into the study atsuch an early time. The problems were they not simply economicand logistic. More importantly, stemmed from fundamentaluncertainty about the natureof language and of thehuman reading process.The present status of suchwork will be consideredunder suggested discussed the sort of future strategies. Here shall be (Daigon, 1966). thinking generated inconferences of consultants

The agreement betweenindependent raters ofthe essays will indicate thedegree to which the essaysthemselves biases, (rather than the independentpersonalities, moods, That is, the etc., of thejudges) influenced theratings. influence inter-rater agreementis a function ofthe physical therefore, of the word patternsof the essays. In principle, judg- the computer islimited in its simulationof the group itself, but ment not by anyspiritual nature of the essay can be only by the extent towhich the computer program designed to reflect the groupresponses(Page, 1967b). be related These groupresponses maybe presumed to These in- to certainintrinsic characteristicsof prose. with trinsic characteristics maydeal with mechanics, -21- organization, with diction, etc. They are described in detail in prescriptive grammars, and elsewhere, and maybe further elaborated by the project's investigatorsand con- sultants. On the other hand, some characteristicsof ultimate interest, some trins, may be unmeasurablewith present knowledge and technology, and somepossible approx- imation to them may be studied, in the hope thatthese second-order variables will be correlated with thetrins.

As one example, spelling may be considered atrin, or almost so. The simplest effective strategy foranalysis of spelling with available computer technology wasto use a list of misspellings. A list of several thousand common spelling errors in their misspelled forms(e.g., Gates, 1937, with later supplement) will, consultantsagreed, possibly account for many misspellings in highschool papers. Each word in each essaymay be looked upin such a computer- stored list, therefore, and a student's"misspelling score" augmented by one point whenever such a wordis encountered for the first time. Not all student misspellingswill be discovered by this method, but scores sogenerated would be correlated with the "true"spelling scores as might be discovered by human examiners, and anygiven misspelling is a trin. There are other available trins. Ungrammatical combinations of words, examples of generally poordiction, and other solecisms may be similarlydiscovered and tabu- lated from comparison with such lists,and may also be considered trins, considered individually.

On the other hand, what of the"less mechanical" questions of content, organization,thought pattern?Let us consider an example of a prox: The Harbrace College Handbook (Hodges, 1951) contains a chapter on"the para- graph." Surely the judgment of paragraphorganization is one of the loftier goals to whichthe project may aspire, and a fully satisfactory simulation maybe some good time

-22- away. But consider certain rules given by Hodges for the paragraph. His Rule 31b is:

Give coherence to the paragraph by so inter- linking the sentences that the thought may flow smoothly from one sentence to the next. (p. 330)

This rule is of course too general to afford much help. But Hodges has given more prescriptive help in the five sub-rules [each provided with examples not reprinted here]:

(1) Arrange the sentences of the paragraph in a clear, logical order.

(2) Link sentences by means of pronouns referring to antecedents in the preceding sentences.

(3) Link sentences by repeating words or ideas used in the preceding sentences.

(4) Link sentences by using such transition expressions as the following:

ADDITION moreover, further, furthermore, besides, and, and then, likewise, also, nor, too, again, in addition,equally important, next, first, secondly, thirdly, etc., finally, last, lastly [etc., through other longer lists]

(5) Link sentences by means of parallel structure --that is, by repetition of the sentence pattern. (pp. 330-335)

These rules suggested some good researchablehypotheses.

..Number (4), with its extensive list ofwords believed appropriate to link ideas in different ways, wasthe most convenient, and was researchable through a straight dictionary-lookup procedure like that used forspelling. The question is then to what degree suchwords may be a prox for the trin of paragraphorganization. Similarly, Number (3) may be researchable, if therepetition of words is alone researched. The repetition of ideas wouldclearly depend on a dictionary or thesaurus beyond the scopeof the

-23- immediate project. For Number (2), a proxmight be the number or prbportion of such pronounsoccurring after the first sentence in any paragraph. (The complicated questions of pronoun referenceagain depend on distantdevelopments in semantic and syntacticanalysis.) Hodges' other rules may perhaps beapproximated rather remotely,but argue for developing oradapting a synactic sentenceanalyzer. Another example of a trin wasword fluency. This variable was clearlydifficult to measure mechanically, since it would often depend uponsemantic understandings, and these were generallybeyond the scope ofavailable technology. Nevertheless, possible proxessuggested them- The selves. Lists of "common words"exist (Lorge, 1959). words of essay text may belooked up in such listsand, where unlisted, scoredappropriately. The ratio of such unlisted words to totalnumber of words may beincluded in the multivariate analysisto determine whetherit aids in predicting evaluative rating. Or another approach,closer to a "content" analysis,would be to check forthe presence of certain words suggestedby dictionary orthesaurus as synonyms or near-synonymsof some thematicwords. And ex- tensive work of this kind iscurrently underway in a new phase of the research.

In short, the hypothesesfor the trinsunderlying the human ratings were very numerous,and preliminarythinking of this sort, bothinitially and through thefollowing two years of work,occupied a fair share ofthe time of consult- it ing experts. As always withmultivariate research, would be far too cumbersometo recount theentire chain of thinking leading to eachspecific prox employed,yet some explanation will be includedin the next section. The most obvious and general hypothesisfor all trins wasthat the written papers receivingbetter human markswould tend to be in a style more conformablewith the standardtextbooks. Hypotheses and_aoxes. The first 30 proxeswhich we We settled upon-grew outof severalconsiderations: (1) would first decidewhich trins wereideally measurable; but as we have seen,such a listincluded almost the en- defined very in- tire handbook of usage,with most points tuitively. (2) We would thendecide what short-cuts might be taken to anapproximation of suchtrins; where into these were easilymanageable, they wouldbe programmed furthermore have, fromthe the analysis. (3) We would nature of our textanalysis, a number ofvariables which and these might would be fortuitouslyand easily come by; prediction. be examined routinelyfor possibleassistance in Ordinarily, as almostall methodologistsbelieve should be (e.g., Tatsuoka andTiedeman, 1963), research primarily theory-oriented,i.e., directed byhypothesis analysis does and associateddeduction. Yet multivariate explication and textof not really lenditself to complete research each separatehypothesis, and ingeneral prediction restrained if it would be unnecessarilyand artificially were notpermitted use of anyconvenient predictors, re- gardless of the vaguenessof rationale fortheir inclusion. be There were in thisstudy a fair numberof what might Some data called, therefore,"proxes of opportunity." about each of theinitial proxes willbe reported later. Here they will belisted, and brieflyexplained. It was earlynoticed 1. Title present orabsent. and some did not. that some studentsdid write a title, division on Itwas guessed, provided there werea fair this point, that thebetter studentswould be somewhat more apt to composetitles; and therewould be therefore an ratings. expectable positivecorrelation with human length is avariable of 2. The average sentence is defined the way considerable interest. If a sentence of the student writerdefines it (thatis, as a string periods), then thereis not words betweennon-abbreviating -25- slight correlation much evidence toexpect more than a (1966), has with auality. Kellogg Hunt, forinstance constant shown that meansentence lengthremains fairly hand, it mightbe with advancingschool age. On the other depend- supposed that acombination of sentencelength and that sentence encyrelations would bereasonably important; be a sign length withoutsuch internaldependencies might style; but thatsentence of the poorwriter, the run-on sign of greater length with suchdependencies might be a language maturity. be very small 3. The number ofparagraphs will often forms of for a reallyimmature writer,just as other will also beunder- linguistic markersand conveniences frequency of para- utilized. Thus it waspredicted that quality. graphs would bepositively correlatedwith writing the sentencebeginnings 4. Subject-verb openings are Without a where the subjectphrase is apparentlyfirst. only approximated,and parsing program,this variable was word would it was done so onthe assumptionthat the first decision. Any in the majorityof cases beadequate for etc., willtypically signal pronoun,article, abstract noun, subordinatingconjunc- a subjectopening, whereas anadverb, sentence. tion, etc., willtypically signal aleft-branching would berepresented An essay's scoreon thisvariable, then, number of sentences. as aratio of subjectopenings to total mechanical style A commonyouthful failingis a stodgy, of the moremature writer without variation,while the sign depending on the purpose is a variety ofsentence structures, prediction wasthat the sub- of the sentence. Therefore the associated with ject-verb proportionwould be negatively writing quality. surely acharacteristic 5. Length of essayin words is and skill;and it is a associated withadvancing maturity ratings fromhuman judges. commonplacecorrelative of high -26- Here the prediction wasthat essay length wouldhelp in the prediction of themark received, and wouldbe positively correlated with writingquality.

6. The frequency ofparentheses might besupposed characteristic, in a high schoolsample, of writingfluency. Among poor writers, manyof these common toolsdo not seem to be a part ofthe available repertory,and it might there- fore be predicted forparentheses, as for othermarks of punctuation, that they wouldbe positively correlatedwith writing quality. (Here and for similarsubsequent counts, the frequency shouldbe taken to mean aratio of the item to the appropriatetotal of the essay. In this case, the number of words is used asthe control for length. Other- wise, length of essaywould be a hidden,contaminating factor in most of theproxes.) category. 7. Apostrophes are in asomewhat different While it is plainly morecorrect to writeDON'T than DONT, it is somewhat better usage,or at least moreformal usage, to write DO NOT. Frequent apostrophesmight be supposed to mark a rather informal orcasual style, and itmight be supposed that informalityis on the wholenegatively regarded in a set themeassignment. On balance,therefore, apostrophes quality. werepredicted to correlatenegatively with writing reliable 8. The frequency of commasmight be the most facili- measure ofthe student's repertoryof punctuation It ties, since commas are morecommon than anyother mark. positively was predicted,then, that commafrequency would be correlated with qualityin a high schoolsetting. frequency of 9. The frequency ofperiods is not, like evidence commas, a markof writing fluency,since it may be Neither of these of short sentences, orof abbreviations. would be considered anasset in such aformal assignment.

-27-

= 10. The freguency of underlined words waspredicted to be slightly, but positively,correlated with writing quality, under the simple assumptionwhich also governed parentheses and commas. Similar piedictions were made for the following punctuations:

11. Dashes

12. Colons

13. Semicolons

14. Quotation marks 15. Exclamation marks

16. Question marks and, out of order:

26. Hyphens

27. Slashes

17. Prepositions are an interestingfrequency. In the first place, it was not possible todesign an algorithm to be very sure about the accuracyof category. For the initial programs, a word was a"preposition" if it was foundin a computer-stored dictionary ofprepositions, though to the human expert it might beserving as an adverb orsubordina- ting conjunction, etc. Prepositions are common words,of course, yet it waspredicted that they would bepositively associated with writingquality, simply because theirfre- quency wouldimply dependency substructureswithin the sentence. When sentence length isheld constant, as was noted for #2 above, onemight suppose thatpreposition fre- quency would varypositively with quality.

18. Connective words, such asnevertheless, however, and also, were assumed tocharacterize language markedby complexity of relationship,and thus were hypothesizedto correlate positively withwriting quality. 19. Spelling errors are of coursethe most obvious and objectivecharacteristic of writingwhich is poor mechanically. In this test, noattention could be given to the errors which aresimply misplaced homophones(such as THEIR andTHERE), nor to other errorswhich were guessed low in frequency. Rather, the list consistedof some of the commonest misspellingswhich are wrong in anycontext (e.g., THIER, BELEIVE,DONT). And the assumeddirection was thatthere would be a negativecorrelation between such occurrences andthe human judgment ofwriting quality. used 20. Relative pronouns areanother set of words by able writers tomarshall and interrelatetheir thoughts.. Therefore it was predictedthat there would be apositive correlation between suchwords and essay quality. similarly ex- 21. Subordinating conjunctions were quality, for the pected to correlatepositively with essay important same reasons asthose above: that such words are sentences and and relatively advancedtools for imbedding relating one thought toanother.

22. The proportion of commonwords in an essay was determined by mechanicallylooking up each wordin the Dale and Hall (1948) list of common words,and dividing thenumber of words in the of such occurrencesby the total nu:aber (some of which would essay. Setting aside misspellings be caught by otherdictionaries), we wouldexpect that those essay words not onsuch a common listwould probably and he less frequent and morediscriminating selections, Therefore we pre- would usually representbetter diction. words dicted a negativecorrelation between such common and essay quality. missing final 23. The occurrence of asentence with a period is very hard tofind, with presentcomputer programs. However, at the endof a paragraph, amissing period is -29- obviously easy to detect,and this mistakedoes occur It would be pre- among veryimmature or carelesswriters. dicted that where such an errordid occur, itwould be negatively correlatedwith writing quality. sentences type A, 24. This item, declarative and the next item, treat anattempt to locatesentences Any sentence where questionmarks are mistakenlyomitted. ending with aperiod was here takento be a"declarative" ascertain sentence. Then the first wordis examined to If whether the sentencemight beinterrogative in syntax. question intro- the sentencebegins with any ofthe common to be a ducers, such asWHO, HOW, WHERE,etc., it is taken that there is a "declarative sentencetype B," meaning interrogative firstword boolean conjunctionof a possibly A "declara- with anon-interrogative terminalpunctuation. is no tive sentence typeA", then, is onein which there either in thefirst evidence forinterrogative sentence algorithm, word or in theterminal punctuation. From this declarative, and maybe then, the sentenceis consistently that would bethe type better correlatedwith the criterion B sentences. type B," there- 25. For these"declarative sentences negative correla- fore, one mightpredict, if anything, a tion with quality. above. 26. - 27. Punctuation marks,already discussed might be pre- 28. The average wordlength in letters because we dicted of considerableactuarial importance, with know from Zipf'slaw that wordlength is correlated correlated word rarity, andword rarity maybe presumed accurate diction. Thus with broadervocabulary and more be positive. the predictedrelationship withquality would length mightbe 29. The standarddeviation of word itself, presumed to behighly correlatedwith the length -30- but it was thought that the additional information about dispersion might add to the total regression. This prox would also be predicted to correlate positively with the criterion.

30. The standard deviation of sentence length would not be presumed, necessarily, to correlate very closely with the length of sentence, since it is a common observa- tion that many persons write consistently short sentences, or consistently long ones. What would appear ideal is mixture of long and short sentences, as appropriate to the context, and one would therefore predict a standard devia- tion of sentence length which would be positively associ- ated with quality.

In summary, these initial proxes were justified partly on rational grounds, partly on common sense observations, and partly by expert opinion. As we shall see later on, most of the predictions were discovered to be in the right direction, though not all; and some were considerably less or more effective than we had foreseen.

The Computer Program. Having decided upon the basic proxes for the first studies, it was necessary to choose a programming language for their implementation. This is not a trivial decision, since the world of "natural-language" programming, -as it is called, has been and is a rather chaotic one. For some large-scale researches, through the past years of programming for natural language analysis, efficiency has been extremely important, both for time and mc-ney considerations. Consequently, some of the most important work in language translation (see Oettinger, 1960), linguistic analysis (Garvin, 1963; Borko, 1967), content analysis (Stone et al, 1966), and information retrieval (Becker and Hays, 1963) has been programmed in symbolic languages close to the machine, such as FAP or MAP. And these low-level languages not only make changes difficult and buggy, but also are extremely difficult to move from -31- one machineconfiguration to another. Such programs areof little help to the new researcherin natural languagework.

At the other extreme arehigh-level and sometimesquite abstract languages whichhave been used forfrontier work in psychology, managementscience, linguistics, andarti- ficial intelligence. Such languages are COMIT,IPL-V, DYSTAL, LISP, SNOBOL, andSLIP. These and others havebeen designed for list-processing,dynamic-storageapPlications, and often pay heavilyin speed and conveniencefor the flexibility and elegancesuitable to such applications. These were also surveyedrather extensively for anysuita- bility for our system needs.

Ultimately, the choice ofprogramming languages for such a purpose should begoverned by these rather over- and lapping considerations: (1) Is it easy to program, easy tomodify? (2) Are the relevantprogramming skills already available in theresearch team? (3) Will the pro- gram in generaloutlive the rapid andinevitable machine changes across the years? (4) Will other researchersbe able to adapt iteasily? (5) Is it natural to our own systems tape? (6) Is it a mnemoniclanguage, easy to comprehend? In light of suchconsiderations and after somefalse starts with COMIT,the investigatorsdecided upon FORTRAN IV, for the following reasons: Our own computerinstalla- tion at the Universityof Connecticut, wasat that time facilities a rather newIBM 7040, withextensive FORTRAN IV as part ofthe regular system tape. FORTRAN was the most- widely used programminglanguage in the computerworld, with large numbers ofavailable programmers. It further- more promisedto be available atalmost all large computer centers for years to come. It is relativelymachine-inde- pendent, with theexception of a fewconsiderations of word-capacity and othermatters.

-32- Especially, FORTRAN seemedsuitable because, when our problem was spelled outcarefully, list-processingand dynamic storage were notyet necessary toanything we wished to accomplish. Such facilities areexcellent conveniences for certain typesof problems; butthe better we came to understand our earlyneeds, the moreobvious it was that we neededthe following: strings into (1) A way oforganizing character re- ordinary alphameric arrays,each row of such an array presenting a recognizable"word", in the usuallanguage aside punctu- sense. This organizer wouldalso need to set ation marks andother non-words; into (2) A way of readingspecial dictionaries immediate-access storage,for easy comparisonwith the words of the studenttext. of (3) A way ofefficiently counting occurrences student sentence and any such dictionarywords, for any essay. other, non-dictionary (4) A way ofchecking on various events in thestudent text. for an essay. (5) A way ofsummarizing the proxes These general goals areshown in onlyslightly more of rigorous a way inFigure III-1, whichis a flow chart that our the first programoutlines. Here it is seen dictionaries were inputin punched cards,and were stored arrays. For in core, in what arecalled double-precision The core many readers,this requires someexplanation. 32,000 storage of the IBM7040 was at thattime limited to register was limitedto computer registers,in which each punction six characters ofthe alphabet,number system, running text) set, etc. While the averageEnglish word (in length, the average is between four andfive letters in words) dictionary word (withsmall proportionsof common -33- FIGURE III-1

GENERAL FLOW CHART FORFIRST PROGRAM PROJECT ESSAY GRADE (ESSAY ANALYSIS)

(Begin)

/punched- Read dictionaries card into double-preci- dicts. sion arrays

0.

Beg.essay rtest of i essays 1. Read .1sentence jj

Set words ilk in d-p array

Compare word iih with approp. dictionaries. Increment approp. counters.

Yes

Essay i ? don Yes /Storeprox Summarize counters )- variables for essay i for essay i

Job No done ?

Yes

-34-

, , will naturally be longer,and words will oftenbe too long to fit within asix-character register. For this reason use wasmade of a facility ofFORTRAN programming called"double-precision" addressing,which permits a set of two suchsix-character words to be addressed as if it were one. This scheme permittedEnglish words to be packed in upto 12 characters,but truncated any wordslonger than 12. Since each word wasoriginally read in from apunched card, 80 characters inlength, the first problemof pro- cessing a sentence was toreorganize these charactersinto permitted words. Such markers as spacesand punctuation identification of such words,and these were then"packed" organized with one from the looseoriginal array, which was character in each computerregister, into the denser12- character registers. Then these text wordscould be compared with thedictionary words bycomparing the first six-character register ofeach word. If a match weremade, examined, and the second six-characterregister was also for the if another match werefound, a hit was recorded particular list examined. This method of"packing" such of words, then, permittedtwo economies: a large economy in only space, since1000 English wordscould be contained of time, since 2000 computer registers;and a large economy in just one a matchof the first sixletters could be made arithmetic comparison of onecell with another.

As is shown inFigure the student essays eventual proxes were alsoinput in punchedcards, and the (Later systems are were outputin punched cards aswell. tape-based.) modified and used This original FORTRANIV program, as throughout the lengthof this presentreport, is listed Since the with considerablecomment in AppendixA. accompanying documentationis fairly extensive, we -35- shall not describe the program in any greatdetail here, although it is obviously one substantialproduct of the work. In general, however, the effort wasto make a pro- gram that would be: (1) efficient, so that expenditure of time would not be too great;(2) modular, so that it might be easily understood, andaltered as circumstances would require;(3) general, so that dictionaries,numbers, functions could be easily changed; andmnemonic, so that variable names would be reasonably easyto learn and remember. An example of the modular andmnemonic nature of the program might be seenin the function which searches for a given text-wordin any particular dictionary. This func- tion is called INTABL, and appearsin statements of the form: IF (INTABL (WORD, PREP, 100)) GOTO 900

Here the argument WORD refersto the particular essay word to which the DO loop hasbrought us in our data pro- cessing. Let us say that such a wordmight be AFTER. The argument PREP refers to thesub-dictionary containing pre- positions, which is stored in core,and may be quickly searched. And the argument 100 is the(maximum) length of that list of prepositions. The function INTABL causesthe program to transfer to asubroutine, which makes asearch in that list called PREP forthe word (in this hypothetical case, for the wordAFTER). If the word is foundin the list of prepositions, then thefunction INTABL is "TRUE," and the command of the IF statementis followed. In the present case, this means atransfer to statement number900. If the word AFTER had notbeen found in thissubdictionary, then the operation would havemoved to the next statement following the IF, whatever thatmight be.

The manner of the search mayalso be of some interest, since dictionary look-upis surely one of theprincipal operations in the program. In a completely random sequence

-36- list an exhaustivesearch would have to bemade through the in question; this wouldbe much too inefficient. Rather, some advantage maybe taken of the alphabeticalsequence, and of the fact thatthe order of the letterscorresponds with the size of thebinary numbers in which theletters are represented. This means that early letters(such as A) will be represented bylow binary numbers(with many zeroes). This also means that aword may be easily comparedwith a given spot in thelist, and it may be saidwhether that word matches it, or maybe earlier in the list, orlater. This is sometimes referredto as "equal to-lessthan-or greater than"comparison.

Such a comparison permitsseveral techniques. The most obvious is toplod through the listuntil the point is reached where theword should be,alphabetically speak- ing. If it is not there, thenthe operation may be re- turned to the main program,with the value FALSE. This technique of using the alphabetin a straight linearsearch will, then, obviously saveabout half the searchtime for the word in question.

A more advancedsearch technique,however, is what is called a binarysearch. This operates by goingat once comparison at that to the middle of thelist, and making the half of the point. If the word isearlier, then the first list is divided, and acomparison is made withthe list at half that quarterpoint. The list keepsbeing narrowed by each time a comparisonis made, so that very soonthe comparison is narrowed to asingle word: if the text word does not match the list atthis point, theoperation re- turns to the main programwith the value FALSE. Such a binary search obviouslycapitalizes on the great economy which of the exponentialnumber. And this is an economy rises rapidly as thedictionary increasesin size. The logarithm base number of comparisons madewill be about the That is, if D 2 of the number of wordsin the dictionary. -37- is dictionary size, and D = 2n, then n is the number of comparisons required, in the usual case, to ascertain whether any word is present in the dictionary. Then if a dictionary is 16 words long, about four comparisons will locate it. This may not seem a large saving over the linear alphabetical search, when the time is added to compute the next comparison. But if a dictionary is 2,048 words long, a mere 11 comparisons will locate a word's proper space, and this binary search yields a great saving indeed.

Other lookup techniques, some even moreEconomical in time, are discussed elsewhere (Hays, 1967, Chapter 5). Without such efficiencies as binary search, practical essay-grading would be prohibitively expensive. A number of other efficiencies were introduced into this program as well.

Preparation of the text. As we have said, eventual implementation will require some fairly direct input pro- cess from the student to the computer, at least for ordinary classroom use. For research purposes, we had these key- punched by clerks at the University of Connecticut, accord- ing to a fairly obvious format. Since at that time our key-punch machines had no upper-lower case differentiation, all typing was in capital letters. Also, the punctuation set was not complete, so that we employed the following conventions:

Name Typewritten Machine Convention

Period . .

Comma . .

Semicolon ; .,

Colon . ..

I Exclamation . .X Question Mark ? .Q Italics (/)XXX

Dash ...... Apostrophe I @ Quote II * -38- Of course, these made no important difficultyin the programming, since two consecutive symbols are very easy to look for. In order to distinguish a period(abbrevia- tion) from a period (end of sentence), we looked for two spaces after it, and took that to meanend of sentence. Similarly, a new paragraph was signalled by four blank spaces at the beginning of a newline.

Key-punching of these essays proceeded at about the speed expected of ordinary typing, althoughverifying might take somewhat longer than ordinary proof-reading.What was more time-consuming wasthat the clerks were under in- struction to type the copy literatim, that is,including every last mistake of thestudent in spelling, punctuation, and word order. This took time, of course, because it would be contrary to the habits of a. careerdevoted to eliminating such mistakes.

The most important aspect of the textpreparation was that nothing was done to the text which wasnot required for it to be machine readable. In no case was any human coding of it done for any purpose of thesubsequent research (for example to identify verbs, nouns,etc.). This means that the copy to be read by the computer wasin almost every obtainable way justwhat the student himself would presumably have written, if he had knownhow to typewrite and had typed it himself on the key-punch.

Summary. This chapter has elaborated thesampling, hypotheses, proxes, programs, and proceduresfor the in- vestigation of machine analysis of essays. And the princi- pal program so fundamental to thework is found in Appendix A of this report. The next chapter will treat someresults of importance from such analysis.

-39- CHAPTER IV

PREDICTING OVERALL QUALITY

This chapter will describe some ofthe findings, and implications of the findings, from the attempt topredict the rating of overall qualityof writing. This describes work done in 1965, 1966, and 1967, largelyconcerned with the data from the Wisconsin study, whichhas formed a focus for much of the research on style upto the present time.

Human ratings. As has been made clear fromChapter II, the principal strategy ofthe work has aimed at the simulation of human judgments, and thesehuman judgments are therefore veryimportant. The instructions used for the ratings in Wisconsin weredescribed by McColly and Remstad (1963). They asked for ratings on"overall quality", and they had four independentjudges for each essay, and four essaysfor each student subject. The individual judges were qualified, buttheir personal characteristics are not of much importancefor our study, and the so-called "individual"ratings represent a kind of statistical artifact. That is, when essays are re- garded as rows, and the judgments arerepresented in four columns, each of these columns is akind of composite, since it may contain ratings from manyof the judges used in the Wisconsin study. Each particular elementin the column is a rating by one humanjudge, but the column as a whole may be thecontribution of many such judges.

With this understanding, it isstill worthwhile to observe the agreement among thesestatistical judges. For our purposes, we chose two essaysto focus upon, writ- ten about one month apartfrom each other. One was writ- ten on the question of whetherthe "best things inlife

-40- were reallyfree," and the other on the "usesof anger." These will be called Essay C("Free") and Essay D ("Anger"). For Essay C, theinterjudge agreement is shown in the upper-left quadrant of TableIV-1.

Here the kind of agreement amongjudges is shown which is usually found forindependent, subjective evaluations where there has been a certainamount of coaching, here ranging around .50 correlationof each individual judge with his peer. It is to be expected thatincreasing the number of judges will increase thecorrelations, since it eliminates some random errorfrom the judgments. To demonstrate this improvement, we have combined thecolumns of judgments in various ways, to find theeffect of increasing the judges to two. When Column 1 (standing forthe first columns of ratings for Essay C)is pooled with Column 2, this sum, shown in Column 5, maybe correlated that of C3 + C4, shown in Column6. The discovered correlationis .66, clearly higher thanthat between any two columns considered singly. Additional comparisons may bemade in a Similar fashion, when the sum of Cl C3 is correlated with the sumfor the other columns. In fact, it is obviousthat = 6 such comparisons can be made, and theresults (in naturalorder) list- are: .66, .67, .70, .70, .67,.66. A more complete ing of such intercorrelations,both between humanjudges, between human judge pairs, andbetween the single and com- bined columns, is shown inother cells of TableIV-1. Other parts of Table IV-1 willbe discussed laterin the chapter.

From psychometric theory, aswell as from suchempiri- cal evidence, we would expectthat the reliabilityof all four columns summed togetherwould be higher still. When such a summation is done,however, it may nolonger be correlated with others in the samefashion, since all of the data have been used. It is neverthelesspossible to estimate such reliabilitythrough an analysis ofvariance -41- CORRELATIONS FOR ESSAYWITH PREDICTIONSC QUALITYHUMAN FROMJUDGE ESSAY AND JUDGED PAIR TABLE IV -1 1. VariableCl 1 55 2 59 3 44 4 88 5 89 6 85 7 65 a 58 9 1060 81 1256 1341 4.3.2. C4C3C2 + C2 445955 4354 5054 4 43450 496488 615289 5 846458:4 538887 846185 2 868756 747983 426059 5 4746 7,6.9.8. C2c2Cl ++ C3c3C4 65858958 85875861 61886489 845253 81858287 678684 8470 837086 6781t83 8286 919392 60656858 47444953 12.10.13, D.,Pred,C3 + C4II D 416056 465659 476087 42843386 496566 496582 2 448658 2 536882 476084 478459 689153 6059 4760 of the columns, and suchanalysis was reported byMcColly and Remstad, producing areliability coefficient of about .83 for each of the two essays we areconcerned with: Such a reliability is not veryimpressive for such an ex- pensive rating process, butit is typical of suchevalua- tion, and it does furnish anadequate target for the multi- ple regression of the proxes.

Having two different essaysfrom each student writer, we may collect acertain amount ofinformation about both individual and group stability acrosstrials. Table IV-2 shows the means andstandard deviations for the twostu- dent essays, first for EssayC ("Free"), then for Essay D ("Anger"). As explained previously, these proxes asshown here are not the rawfrequencies for the essays,since such frequencies would have usually alarge contaminating factor of essay length. Rather,they are the scores asconverted to ratios and thenmultiplied to make a positiveinteger in each case. The transformation formulae aregiven in the FORTRAN program,printed in Appendix A. The proxes employed havebeen previously describedin Chapter III, and thereasoning employed for each,together with a prediction of theanticipated direction ofcorrela- using tion. These proxes were measuredin the D essays, an earlierversion of the programlisted here in Appendix A, and these proxes werethen used in amultiple-regression analysis to predict the humanjUdgments for Essay D. Among the aspects examined werethe correlation ofeach prox with the criterion, the betaweight contributed byeach prox, and thetest-retest reliabilityof each prox. This information is summarized inTable IV-3.

In this table, Column Alists the proxes bytitle, and in the same order asdescribed in the lastchapter. Column B shows thecorrelation of each proxwith the crite- rion, which was the sum offour human ratingsfor each reliability essay. And Column D indicatesthe test-retest -43- TABLE IV-2

MEANS AND STANDARD DEVIATIONS OF THE PROX SCORES

Proxes EssayC Essay D Mean St. Dev. Mean St. Dew.

.38 1. Title present .90 .29 .83 41.85 2. Av. sentence length 176.79 41.91 175.48 5.16 1.90 3* Number of paragraphs 5.47 2.14 45.01 12.36 4. Subject-verb openings 49.27 14.01 104.44 5. Length of essay in words 397.40 112.62 361.32 4.09 6. Numberof parentheses 2.02 4.55 1.67 8.75 8.28 7. Numberof apostrophes 9.62 9.20 21.64 8. Numberof commas 48.81 22.26 40.97 12.36 58.25 12.20 9, Numberof periods 56.70 3.26 10. Numberof underlined words 1.74 3.65 1.42

U. Number of dashes 1.97 4.50 1.37 3.39 1.72 12. No. colons .57 1.59 .47 13. No. semicolons 1.54 ;4447 1.22 2.73 156.48 14. No. quotation marks 293.19 275.16 114.45 34.00 15. No. exclamation marks 8.61 23.36 11.06

16. No. question marks 53.35 70.97 25.46 49.37 8.90 1.80 17. No. prepositions 9.73 1.76 .58 18. No. connective words .35 .51 043 .12 .34 19. No. spelling errors .11 .31 1,93 .96 20. No. relative pronouns 2.03 1.05 2,89 1.14 21. No. subordinating conjs. 2.22 1.00 5.09 22. No. common words on Dale 81.89 4.58 79.17 1.82 23. No. Bents. end punc. pres. 99.07 3.23 99.52 6.35 24. No. declar. sents. type A 92.45 8.48 9548 .61 2.11 25. No. -declem. Bents. type B *56 1.63 3.94 26. No. hyphens 2.53 4.69 1.95 .10 .59 27. No. slashes ,05 .39 24.47 28. Aver, word length in ltrs. 423.77 23.41 438.36 232.32 23.13 29. Stan. dr'. of word length 217.72 20.31 31.83 30. Stan. dev. of sent. length 82.56 29.63 78.07

NOT& These means and standarddeviations are based uponthe trans- would be a formed scores, altered sothat every individual score positive integer, and wouldusual4 express a relativerather than au abaolutefrequency. -44- TABLE IV-3

PDXES USED TO PREDICT A CRITERION OF OVYALL QUALITY (ESSAY D)

A. B. C. D. Proxes Corr. with Beta Test -Ret. Re1. Criterion Wts. (Two essays)

1. Title present .04 .09 .05 2. Av. sentence length .04 -.13 .63 3. Number of paragraphs .06 -.11 .42 4. Subject-verb openings -.16 -.01 .20 5. Length of essay in words. .32 .32 .55

6. Number of parentheses .04 -.01 .21 7. Number of apostrophes -.23 -.06 .42 8. Number of commas .34 .09 .61 9. Number of periods -.05 -.05 .57 10. Number of underlined words .01 .00 .22

U. Number of dashes .22 .10 .44 12. No. colons .02 -.03 .29 13. No. semicolons .08 .06 .32 14. No. quotation marks .11 .04 .27 15. No. exclamation marks -.05 .09 .20

16. No. question, marks -.14 .01 .29 17. No. prepositions .25 .10 .27 la. No. connective words .18 -.02 .24 .23 19. No. spelling errors -.21 -.13 .17 20. Noprelative pronouns .11 .11

.18 21. No. subordinating conjs. -.12 .06 .65 22. No. common words on Dale -.48 -.07 23. No. seas. end punc. pres. -.01 -.08 .14 24. No.declar. sents. type A .12 .14 .34 .09 25. No.declar. Bents. type B .02 .02

.20 26. No. hyphens .18 .orr -.02 27. No. slashes -.07 -.02 .62 28. Aver, word length in ltrs. .51 .12 .61 29. Stan. dev. of word length 053 .30 30. Stan. dev. of sent. length -.07 .03 .48

*Number of students judged was 272. Multiple R against human criterion (four judges) was .71 for both Essay Cand Essay D (D data shownhere). F-ratios for Multiple R were highlysignificant.

-45- for the proxes, that is, the correlation between Essay C and Essay D for the proxes, as a measure of writing habit, or stability of writing behavior, in the student writers.

Overall prediction of the proxes. In multivariate analysis, it is often pointless to elaborate a hypothesis for each predictor, and to explain how each variable met expectations, or failed to do so. But it may be instruc- tive to note how well the predictions fared as a whole. While some of the predictions were very tentative and loose, and while many of the variables obviously had only a non- significant relation with the criterion, some estimate may be made of the overall sucdess of the predictions.

In general, the predictions were quite accurate, notwithstanding the obvious large random errors in the relationships which are evident in the table. The degree of success was examined and the results are shown in Table IV-4, which displays a contingency diagram for the direc- tion of prox correlation with the criterion (positive or negative direction), and shows the relation between the predicted and discovered directions. Here the number of agreements is seen as 21, and disagreements 3. As is also shown in Table IV-4, the chi square was computed to be 3.12, which, with one degree of freedom and the assumption that a one-tailed test is appropriate for such agreement, is significant at the five per cent level of confidence. One may conclude, therefore, that most the predictions were significantly in the correct direction.

Correlation with the criterion. It does not make much sense to describe a summary table in any detail, but it is useful to comment on a few outstanding points. As was explained in the last chapter, many of the predictors used were "proxes of opportunity", and it is not surprising that they were relatively unproductive. This is generally true for the large number of punctuation marks. The more major contributors to empirical prediction were usually foreseen. -46- TABLE IV-4

THL DMECTION OF CORRELATIONS OF PROMS WITH THE CRITERION: PREDICTED AND OBSERVED FREIOJENCIES

Predicted

Obaerved

N = 28, since two variables were not predicted.

N( AD BC 1)2 2 = 3.12(significant). (A+B) (CAI)) (A+C) (Bi-D) ,1114r.ii" ^'Oto. .$ TT- IT- TO- Co- oo Lz- TO CO Co- ZT So- co- /o oC 8Z 0T- TZ- 90 60 /T /T 90 So CO- ITT Co /0-Co 00 65- Co CC Tz ZO 90- 01- 04 ZT-LO- 0C6Z TO-TT-Co- CO-99 91 90- 985T Co--90-CO- 9T5T To- CT 90- LL- 5z- To- Co- 8Z Co- To- 60 zo ZOCo- 60- 90 LT- to- 01- /0- To To 60s i OTT /0 Lo- g0- OT CO- Z8- Cz- T o - Zo 9Z 60 90 50- OT ZO- TO 9Z TT-80,D 8T- 50- 60 /0 60 Lo- ZO- To 5T 90- CO- CO- Z0 To- 90- CO 90 /0 LT- 50 60 6! OC 0C-6z-80- Zo6o oT ZO-Co Zo VT- oz-/or TO- 90ZO- ZOTO 80- 90 9Z /0 TO aoZT- 8Z Le-00to TO-90- ZOCOP- ZO /1 TO-60 CD-60- Le- 90 /0 4Z-tC /0 /0- LO- 00- TO 90- /0 tC ZOLo- LI 10 CO10- so- 90- 90- Co- 90- Ct- oo- zt- Lt-Co- zom. 00- oo- 90 oo08- L4r- 5z- Co- 01- 51- It- LI CO- 00- 91 ZO 00- 01 CO- 00- ZT TO 00 TO /I- 0Z- tZ- 60 to 91- /1 90-Co 10So- 10- ZZ-it- tt 90 goCo- CZ10 ta 5Z ztCo-Co 5z- Cz- 80 10- LO- 21 LL- oy-zs- to-60 Co- 01- 00- Co Lt- /0- Lo- zo to 6z80 CI- 6z CI-so 90 /1- LZ- ZI- 01- 50 60-zo- 5z-90 oo-SC- Co-90 oo- ot /1 90- 01- /0- Li- oa- to 90-to- 10- oo ot- zo- 91- ot Lo- St- 90 60- 00 /I- tt--zt IZ- 60 ZO Z0 CO Ct oC- 9C 90- Co- tz /0- 10-tg /0- Ot ZZ 470-'CO OZ *CZ /0OCSo-50- OT-Co- ez TT-ZO 90- 60 9Z 9Z 9z 10zo-81 ToTo /090- 90 00- 90- 90- 5C- 5z- Lz- /T- oo- 90 So-ZI- LT/0- 10 50- 09- CO- 90 a- So- To Co 60- Co- 80- ZO- 90 60- ZI- So- to LT Co 90- oo- Co 60- 80- 50- Co- 90 00- /0 CZ- CT ZT00 /0ZI 90- To So-ZO tt-LO zor- oo- st- ZO- 90 SO CT TO /0 5T- 80 CT TT Lo- 90- Co to- Co 915T LZ-60- zoCO zo-/0 9T-So- SOoo /o- 411- 170- 80- Zo- 'LE TO-LoCo 6T LI9T IT/T6090 .00- ZO- TO 90TZ-So 8T-60 50- oT 60 /0 /0 91 Lo- Lo- zo Z0- 00- OT- ZT- /0- TO- 00 00- Ll- 90- /T 5z-Co- CT- 00- 90- 01- 50 90- TO- 90- 50 OT- 00- 00 90- 90- oT /0 00- 50- ZT 9T Co- 90 90 ZO OT Co- TO 8T CT t 91TZ CT01 STOT /0 91 CT- 90 Zg TZ5T 61 0ZTo 90- 40- /T CO /1 90- TT- /0- TT01TT50 TE- 9Z TO- 11 OT 50 LTCzCZ CO-00- /0-LO ZO- TO- Zo 61- Lo 50 00- /0 Lo TO-TO So /TCT aT 51 Co65-CO /0-/Ioo CO- CO- CO CC OCzoST To-90-90- oT 606T 00- LT ZTCO-oT /1-TT- 00ZO- OC- TO- 5T-- 91- 60OT 5T- LT- LT- 0T- ZO- TT- CT 00- To ho- TZ- 00- /0. Cz- ST- CT- /1- Lo- To- zo OT-ZO- 5TTO /0 CToo- Lo to 61 To OZ TC 91 TT 90- So Ot oo- oirzo51- /Z6o /0CT 5T- zo- Co- 80- oT 61 'se- Co- ST 90 OZ- 6o- OT VTST 0060 /T90- 01 6T 90- 90- ZO CO OT TT LO-To 60 TZZIZo 6- oC- CO CO60 ZO Zo- /0 90 OT80- TO OZ- TO 00 TZ- 8T- 90- 60- zo 90 9CCO- 00 CO /I- ZO- /0 90 ZO zo- 60- Lz- CT zo CO TT80Lo- 90 00- 9Z Zo IT /T /Z 60 ZZCZ CZ lt LT 50 CO80- OT CO- /0 60CT 00ZO- 90- CO- 90 00 5T 8T So- ST 91 91So 00 OT CO- ZO- co Lo-90- et9C Co Lo Ca-zoCD- 9 L 5 Lo-01OT-90- ZI-90OZ- ZT-ST- ZOLT- /0- 80- 50 zo TOTO- /0 zo- 50- TT- ZZ- TO- TT- /0- 80- 00TO 90ZO80- ToTO TO- 60 90 /I 50- Zo OT ToIT /0- ZT TZ ZO- /0- TZ /0- 00- 40Tz /T-CO CT /0- So- 91- 90- Lo 6T- TT- /0- ifOr90- Lo/I OT OZ- OT So TO-So CO /0 TO To- So Co 50 Lo /0- zo- 6T OT 61 60- CO- ZT 9C 11 oa- 90- 90- 88- 01 Co ZO TO Lo- So- Cz- ZO To to 60- ZO- CO Lo- 90- Lo Zt.-90 CZ- 90 ZT-Cz- TO 60- z C oC 6z 8Z Lz 9Z 5z /z Cz ZZ TZ 0Z a IITSH HOd61 SHX0Hd 1HI do SNOLINI3HHOOHUNI 9T LT 9T 5-AI aimST /T CT zT TT oT 6 819 5/CZTxmai 6000 6o 90 To- 8tr a9 to 6T TO 92 TO- II 0I-ao So-60 Co- CT- zi spoCO 18- 6o- to- LO- TT 9L- CT- It- 0T- Lo Tz Lo- 90- Lo oo So- TT 61 60- Lo- Vo Vo ko 9z CTST TO- CTt /0-oo- TO-90 90-Lz ti-TO 90 Lo 00- 65- ITE CZ TZ- CO 90 Lo VT90-C0 0T-So- oo-to Toao 00- 6z 9o- 50- L9 6o- ao 0CLZsa To-OT- 10 aoTO To- zo- Lo- 61 9z ao- ZO- to- Lo- Vo To oo Lo- So- Co IC- TT- To- ao /0- 91 Sz- 90- TO TO 90- ao- So- Lo- Co To- ZI tO TO- ko OT TO GO 90 VoLTZO- Co- CT CO.'So CT Co- OT CISO,- Lo IC Co- TO- /0 90 /0- 00- TO ZO-SZ- ZI ST /T-SOP° ZI- CO /0- ko- VO /0- OZ CO-IT- 90LT a- ZO /01 zoo- OT- 50- TO- Lo- So- ao- /0 CO- VZ 9eSZ aCT-CO-So- CT- 92.-/060 6o-IT 18- oo TO II-TC- ToTO Co /0ko- So- 5z- Co fio- LC 5050- ao- LE tSt zo-So- 50ZT- SO /0 II 00 TO ST OTVT 90CO ko-50- 90 zooo- Cz- 98- Lo- LC- CT-So- TO90- 90- CT- /0- LT- 90- TC- VO 00- OT- a- CO CO- Co CO-ao 90- SO CO-ao- a-60- TZ- ST CT-Vo- SoVo- 6' TO 50 OC- OZ LO- CO ko oo- ao- TO- TZ 90- TZ COCT 6- Co- CO CO- 90- CO- ZZ oaCz ooZtCo- zeott ot-tt- tz40- /0- zo Zt 61tt Loto- Vo- 50- 90 91zo10-' 90- Vt zo-Lo- Lo- oo- /0 oo Sott ot90-Co IC-ZO CZ- 61- TT Co-90 ZO- VO zo-So V0 90-So IT61-90- 9060- TO- CT91- 00- zo Co Lo oo- Lo ZO-CT TO- /0- 60- ST OTCo- Car zo To zz- 11- To- zo- 90 40Vo Cozo- To-ao- II-Vo 9061 VT- TO- 00- 9Z- co- Vo IT-So oo-CO- 90-I0- /0-Vo aoII 90OT CoTo- ST ko- 42 00- TO- 00- st 6TLI /o Lo- 60- ot Lo Co 99- to /o Lo- 6o- 90 TO- 00- ST /0 90 a 9T 6T CI- a 90 So- LO- TZ- /0- 61- 90 91ST cr, 1 9zLoCt Lo /0 90-51 Lo- op Zt zo-Co- Lt Co- 50 90 too to- LE- Co oo- Et- 9T- ET 00- 90-So- CoCo- 01- to Zt- ao- Co- To- Lo ao ZO- So LOoTCT 90VOST 00 90 00CI 90 CT 9060Z1 60LT00 9T TO- 81 91 CO-CO CT- OT 10 ZC- 91 Lo- CT ZO-90 VO LO- VOL Lo- TO- 90SO OT- CO OC /0- VT ZO- TO- OT VT- CT 00-2.0tt /0- 90- IC Ct10- 00- et 90 Co- Cc £1Lo 01zo-Co- Et-5z- ZO6o- Co 41- ao- 90- Co- ao 60- ZO- CO 91 00 CO- /0-.00- /0- /0 Zt- TO ST 20- TO- 61 ko Z1 a bo 91 41CO 60 90 90COsp 6T 90Co 61 So- It CO ?I- 91 /0- ST otGT10- LO 90 00- 50- So /o-91 oo- 90- 00- VO 10- Lt 10Co- ao So- TT OTa 65-81 LO90op- 10 80/T-LZ TZ- 90 a CO a- COCZ /0 /C /0-Or /0 CO- Z1 LO-SO- /0-It- 6Z-/0- LO-OC- LOCt- VT- /0 St /0-St oz 50 Sota- To- zz- So So- /o TT- CT- TO- ZO- 00- TO- CO- 50- ZO- ET 471-90 6T IT- a 9081 9T ST Lo- OT To TTOT 91TO- LI ZC- CI- So- 90 a LO Vo-90ST 6o- 110ZO- 90 60-91 10 ao- 60- ZO- CT 0- 90 809T SO So Cz- 90 CO 90 90 to- Co- To CO- 50- TO- So- So TO TO 969L inSO-90- /0oT- So- to ot- CO 90-00- 00- 50- ZO- 50- /0- 80 /0 II- 6- Co- oz TOCO COCo-CT To- Lo- zo zo-00- 90ST Vo 8Z- VO- TZ- ko- OT- ZO- 00- 90- 00- TO- Lo- Vol oo- IT /0- /0 90CO TO- 90- LT So Vo- 91 oo- 60- So /0- So- zo- Cz- Co- So- TO- So-So 59- CO CT 90 TOzo-CO- TO- TO SO CTLI VO- TO- LV CT IT- 110- Lo Vo- / VT- ZO- 5C 6o-149 oo- zo ZO TO9Z /0ZO-LZ TO- ZI- LI LO-9z ZO 5z CO-/z CO CO-Cz 90- TZ CO- 90- 9T- CO 00- TO- 90 LzOZ 61 90 TO-91 01 LT 61-91 LO- OC OT 00- /0 TO-ST /0-VT /T- VO So- ao 90 90 So- TO IT- /0- ZO- IT- Lo /T- TT- Z 111111.1.1~1MIMMIMINNOMIPUIIIMMOINNIIIMINC OC 6Z zz TZ 0 AYSSH 110d SHX0Ild 3BJ dO moiarautionam 9-AI THIVI CI a IT OT 60L9 5 The average sentence length wasanticipated, from other literature, to be moreimportant in a multivariate than bivariate way, and thisis apparently the case. But these higher-order relationships,which are only hinted by the beta weights andintercorrelations, are very difficult to articulate. The length of essay in words,number of commas, number ofprepositions, number of connectives, relative pronouns, spelling errors, commonwords, and long words were all in theanticipated direction.

On the other hand,there were some surprises inthe data. The number of question marks waspredicted to be indicative of variety in style, yet was anegative pre- dictor. On the second set of essaysanalyzed, however, it has moved from -,14 to.08, which implies that there may be aninteraction of this feature withthe wording of the assigned topics, or of theaccompanying instructions to the student writers. Such an interactionbecomes plausible in light of theinterrogative wording of the "Free" question. Another sur7rise was thenegative correlation with the criterion fovariable 21, the proportion ofsubordin- ating conjunctions. The assumption that theproportion would reflect complexity, andthat complexity wouldbe re- lated to maturity of style, wasnot destroyed, butit surely was shadowed by the negativecorrelation of -.12 forthe D essays. Here an interaction withtopic is not aplausible explanation, since for Essay C thediscovered correlation had moved only slightly, to-.06. It is worth notethat for both essays, when the otherpredictors are takeninto consideration, the beta weights forsubordinating conjunc- tions are both positive. But here again,explanation of such higher-order effects aredifficult to ascertain. It is probable that the explanationsof this surpriseshould be pursued in the specificwords in the list ofsubordina- ting conjunctions, and infurther syntacticanalysis of the sentences whera they areused. This sort of explorationis further discussed in thefinal chapter of thisreport. -50- Surely the question of fluency isa most important one in the evaluation ofessays. Two strong arguments for some general trait of prolixityappear in the importance of word length and essay length-- the first being the highest correlate with the criterion, the second yielding the high- est beta weight. And these relative positions are main- tained for the C essays, as well. So that such comparisons may be easily made, Table IV-7 contains the prox informa- tion for Essay C, the "free" essay. Column A has of course the title of the proxes. Column B shows again the correla- tion with the criterion. And Column C displays the beta weights for the proxes, when all variablesare used to maximize the prediction of overall human judgment.

This table (IV-7) has the same statusas Table IV-3, for the D essays. The D essays were presented first only because, historically, they were analyzed first. In fact, what they have in common is at once apparent to the naked eye. Most of the important correlations with the criterion are maintained in Table IV-7, and most of the important beta weights have sustained their contributions with the second essays.

Multiple regression. From the standpoint of ,overall simulation, the multiple correlation obtainable for the pooled human judgments is the primary goal of the analysis. For Essay D, the multiple-R achieved was a rather startling .71. And when it was possible to perform the same analysis for Essay C, although there were obvious changes as we have seen, the resultant multiple-R was once more (coincidentally) just .71. This coefficient means that for this set of proxes, and for these sets of essays, the correlation be- tween the human ratings actually achieved,and the "pre- dicted" ratings generated by the discovered beta vector, would be .71. Civen the looseness of human rating, and the pooled human reliability of only .83, the multiple re- gression coefficient is encouraging in the extreme.

-51- TABLE IV-7

PROXES USED TO PREDICT A CRITERION

OF OVERALL QUALITY (ESSAY C)

A. ... C. Proxes Corr. with Beta wts. Criterion

1. 'Title present .03 .06 2. Av. sentence length -.07 .09 3. Number of paragraphs .08 -.02 4. Subject-verb openings -.01 .09 5. Length of essay in words .25 .03

6. Number of parentheses -.05 -.05 7. Number of apostrophes -.16 -.09 8. Number of commas .36 -.29 9. Number of periods .01 -.01 10. Number of underlined words -.06 .07

11. Number of dashes .31 -.15 12. No. colons .14 -.06 13. No. semicolons .09 .17 14. No. quotation marks .12 -.12 15. No. exclamation marks -.04 -.09

16. No question marks .08 -.05 17. No. prepositions .16 -.06 18, No. connective words .11 .10 19. No. spelling errors -.21 .01 20. No. relative pronouns .01 .10

21. No. subordinating conjs. -.06 .25 22. No. common words on Dale -.37 .15 23. No. sents. end punc. pres. .12 .34 24. No. declar. sents. type A -.00 -.05 25. No. declar. sents. type B -.05 .11

26. No. hyphens .26 -.Le 27. Nb. slashes .03 .00 28. Aver. word length in ltrs. .37 -.03 29, Stan. dev. of word length .45 .26 30. Stan. dev. of sent. length .08 .09

-52- As is well known, however, weshould not expect allof this accuracy if we took new essaysand applied thedis- covered beta weightings to them,to predict theirhuman ratings For any set of scores, or anyset of resultant correlations, contains not only truevariance associated with the variable, but also acertain amount of error variance, random for theparticular subjectsconcerned, which will not ordinarily befound with a new setof human subjects, or essays. The true variancegives us informa- tion which will besubsequently useful. But the error variance is also capitalized uponby the analysis,and a certain portion of themultiple-regression coefficient, and of the contributingbeta weights, willspuriously seem tocontribute, but will notstand up in a replication. When one does run such ananalysis, then, andsubse- quently cross-validatesthe weightings with newdata, the resulting predictions will notcorrelate as highlywith the criterion as onemight hope. The statisticalloss is commonly spoken of,as"shrinkage" and has beenwidely treated in the literature(e.g., McNemar, 1962). Fortu- nately, empiricalcross-validation is notalways necessary, since the performance ofsuch data may partlybe predicted mathematically. As one would suppose,the larger the num- ber of subjects, the morereliable the multiple-Rwill be; but the larger thenumber of variables(given the same number of subjects), theless reliable themuitiple-R will be.

The Paulus tables. Since our work of essayanalysis continues to be heavilydependent upon multipleregression, Dieter Paulus has made aninvestigation of thebehavior of such data, given avarying N of subjects,and varying n set forth in a of variables. Some of his findings are usable form in TablesIV-8 and IV-9. Table IV-8 shows the minimum Multiple Rcoefficients requiredfor significance -53- TABLE IV . 8

mINIMUM MULTIPLE CORRELATIONS MUIRED FOR SIGNIFICANCE AT THE .05 LEVEL

SAPTLES IZE

NUMBER PREDICTORS 50 75 100

5 .4652 .3815 .3308 .2963 .2703 .2509 .2346 .2099 .1916.1659

10 .5898 .4861 .4221 .3788 .3460 .3215 .3008 .2694 .2459 .2131

15 .6828 .5635 .4901 .4426 .4047 .3746 .3498 .3143 .2870.2495:,

20 .7565 .6282 .5485 .4942 .4513 .4180 .3925 .3521 .3217 .2790

25 .8207 .6858 .5994 .5388 .4927 .4578 .4290 .3851 .3520.3046

30 .8751 .7337 .6429 .5778 .5301 .4929 .4622 .4151 .3796 .3287

35 .9227.7790 .6831 .6166 .5641 .5249 .4924 .4415 .4039 .3498

40 .9623 .8189 .7203 .6492 .5958 .5536 .5183 .4648 .4265 .3707

45 .9923 .4568 .7549 .6812 .6261 .5809 .5455 .4897 .4471 .3875

50 .8906 .7875 .7118 .6540 .6074 .5694 .5115 .4684 .4063

- 5 4 - TABLE IV - 9

MINIMUM MULTIPLE CORRELATIONSREQUIRED FOR SIGNIFICANCE AT TIE .01LEVEL

SA1.TLE SIZE 200 250 300 400 NUM3ER 50 75 100 125 150 175 PREDICTORS .2444.2231 .1933 5 .5312.4388.3819.3433.3135.2907.2724 .2764.2396 10 .6471 .5382.4705.4234.3871.3599.3369.3021 .3160.2741 15 .7322.6107.5345.4837.4437.4114.3839.3452 .3033 20 .7996.6726.5901.5327.4893.4532.4256.3831 .3502 .4155.3809.3301 25 .8572.7257.6397.5764.5293.4917.4621 .4447.4070.3520 30 .9042.7703.6802.6134.5640.5254.4931 .4728.4319.3739 35 .9444.8116.7184.6510.5966.5574.5247 .4528.3931 40 .9762.8488.7529.6824.6286.5847.5484.4933 .5738.5166.4734.4122 45 .9968.8825.7852.7125.6575.6113 .4943.4296 50 .9129.8151 .74C8.6829.6355.5958.5390

-55- (at the 5% confidence level) withdifferent n andN. The nunlber of predictors is scaled from5 to 50, along the left hand column, and the number ofsubjects is scaled from 50 to 400, along the top. It may be easily seen,then, that for the present investigation,where predictors number 30 and gases just over 250, amultiple-R of about .41 is necessary for significance atthe .05 level.

Table I11-9 shows similar requirementsfor the .01 level of confidence, showing that around.44 is necessary to re- ject the null hypothesis. These Paulus tables are very convenient in dealing with largenumbers of such coeffi- cients, and seem to be a usefulby-product of the present research. (For the computational reasoning, seeKelley, 1947, p. 475)

There is another familiarproblem in interpreting re- gression, however, and this onedepends on the reliability of the criterion. It is obviously impossibleto predict perfectly a criterion which isitself not perfectly relia- ble. And the reliability of a groupof human raters ob- viously depends on the number ofsuch raters and on their inter-judge agreement. As we have seen, thereliability of the group of four ratersin Wisconsin was .83,and this 2 means that about31% of the variance (1.00-.83 ) would be unexplained and indeed "unpredictable." When one is con- sidering purely practicalpredictions for groups that are identical, it is reasonable toignore this handicap. But when one is attempting to assessthe "true" accuracyof a set of predictors, it is morefair to take suchcriterion unreliability into consideration.

Paulus designed Table IV-10 todo just this task.* The left column refers to thediscovered multiple-R co- efficient, and the top headingsrefer to the measured re- liability of the criterion variable. Just by finding the appropriate cell of this table,then, one may inferwhat

-56- TABLE I1J-10

MULTIPLE REGRESSION COEFFICIENTS

CORRECTED FOR ATTENUATION

CAUSED BY CRITEZION UNRELIABILITT

Discovered Reliability of Criterion. Variable ZULTR .80 Coefficient .35 .40 .45 .50 .55 .60 .65 .70 .75

58 56 .50 85 79 75 71 67 65 62 60 57 .51 86 81 76 72 69 66 63 61 59 60 58 .52 88 82 78 74 70 67 65 62 90 84 79 75 71 68 66 63 61 59 67 65 62 60 .54 91 85 81 76 73 70 66 64 61 .5 93 87 82 78 74 71 68 67 65 63 .56 95 89 83 79 76---72' 69 71 68 66 64 .57 96 90 85 81 77 74 69 67 65 .58 98 92 86 82 78 75 72 76 73 71 68 66 .59 99 93 88 83 80 69 67 .60 9 89 85 81 77 74 71 1 9 91 8 82 79 7 73 8 .62 98 92 88 84 80 77 74 72 69 70 .63 99 94 89 85 81 78 75 73 72 .64 95 91 86 83 79 76 74 78 75 7. 2_0 9721_1.8_1481 82 79 76 7 ----.176 98 93 89 85 75 ,67 99 95 90 87 83 80 77 81 79 76 .68 96 92 88 84 86 82 80 77 .69 98 93 89 84 81 78 .70 99 94 90 87 88 85 82 79 .71 96 92 86 83 81 .72 97 93 89 94 91 87 84 82 .73 98 96 92 88 85 83 .74 99 90 87 84 75 97 93 ----I1 -88 -75 92 89 86 .77 99 96 93 90 87 .78 97 94 91 88 .79 98 96 92 89 .80 99 the discovered correlation might have been if the criterion had been perfectly reliable. This table was produced, like the two before, from equations programmed by Paulus for the time-sharing 'console in the Bureau of Educational Research at Connecticut. It was based upon the division of the multiple-R coefficient by the square root of the reliability of the criterion variable (Kelley, 1947, p. 412).

Still another table serving such needs was designed to perform automatic "shrinkage" of multiple-R coeffi- cients. As we have noted, when MULTR is calculated, it finds a maximum fit of weightings to the sample data. But the sample data do not reflect merely the true covariances of the population. They also reflect random error typical only of the cases constituting the sample, and the computa- tional method capitalizes upon such random error, just as it capitalizes upon the true covariance. And such random error increases rapidly as n, the number of predictor variances, increases. As we have also noted, however, the sample size tends to counteract this mounting random error. The "shrunken" multiple-regression coefficient, then, is the statistical estimate of what the coefficient would have been, if it had not capitalized on such random error. It is therefore, of course, always smaller than the observed coefficient.

There are several formulas available for such shrink- age. Perhaps the most appropriate one is the Wherry for- mula (Kelley, 1947, p. 474), expressed by:

2 Rs (N - 1)R2- n N - n - 1 where Ris defined as the shrunken coefficient, R is the s discovered coefficient in the sample, N is the number of persons cases in the sample, and n is the numberof predictor variables. The Wherry formula expresses what is believed

-58- to be the "true" multiple-R coefficient in the population of interest (rather than in some other sample from that population). This formula thus seems most appropriate for such exploratory research, where the population parameters are indeed of central interest. And it was therefore pro- grammed into the computer to produce the various tables for shrinkage.

These tables are listed in Appendix B, since they are too large to include conveniently in the running text. In Appendix B, the tables are divided according to size of n, the number of predictor variables. These sub-tables are as follows:

Organization of Table IV-11 (See Appendix B)

Sub-Table Number of Predictors

A 25 30 35 40 45 50 55

To avoid too massive a document, Paulus restricted the size of n, therefore, to the range from 25 to 55. He also restricted the sample size to a range from 100 to 300. Both of these constraints mean that the tables are very appropriate for studies of the present size, and a large number of empirical studies seem to fall within these limits.

Use of the tables. In the present case, we can immediately apply certain of these tables to the discovered data. We would ordinarily shrink the *MULTR before we would

-59- would enter correct it forattenuation; therefore we than 275, and adis- Table IV-11(B),with n = 30, N greater cell yields us ashrun- covered R of .71. The appropriate with a new ken R of .67. Then we may enterTable IV-10 "discovered R" of .67,and a knowncriterion reliability find a coeffi- of over .80. In theappropriate cell we for cient of .75, whenfirst shrunken andthen corrected attenuation. such tables, the user .In this use asin other uses of table cells aregenerated should alwaysremember that these theoretical rom formulaewhich inevitablymake certain One will assumptions aboutthe distributionof the data. cross-validation not necessarilyexlpect, forinstance, that shrinkage with with anothersample will matchthe Wherry of assumed any exactness. In the firstplace, violations shrinkage through distributions willoften cause a greater hand, cross-validation than onewould expect. On the other will often be,for the the predictionof a new sample lower than thecorresponding pre- reasonstouched on above, But these diction would beof the populationitself. approximations tothe tables can surelysupply rather good interested in, but statistics which we maybe very much cannot measuredirectly. validation with Reliability of proxes. To some extent, the reliabilityof the subsequent sampleswill depend upon depend in part uponthe multiple correlation,and this will As alreadynoted, reliability of theindividual proxes. shown in ColumnD of the reliabilitiesofthese proxes are of Column D arethe product- Table IV-3. The coefficients different essaysfor a moment correlationsbetween the for the secondproxlisted, particular prox. For example, coefficient representsthe "average sentencelength," the for two differentessays similarity betweenthese averages correlation is an written about onemonth apart. Thus the and seems areasonable measure extremelyconservative one, -60- (but quitesimilar) of "writing behavior"under two separate stimulus situations.

A fewgeneralizations may bemade about the prox place, it seemsthat there reliabilities. In the first Those is a correlationbetween Column Band Column D. typically those proxes withthe highestreliability are also The which aid most inthe predictionof overallquality. order highest reliabilities seemto belong(in descending (#22), of magnitude) to: the proportionof common words length (#28), averagesentence length(#2), average word proportion of commas(#8), and lengthof essay in words length, these (#5). With the exceptionof average sentence predictors of same proxes areamong thebest (bivariate) sentence lengthis among writing quality,and even average in the combined,multi- the moresubstantial contributors variate prediction. that A secondgeneralization aboutsuch proxes is the frequencyof their reliabilities maybe related to deal with the mostfrequent occurrence. Those proxes which proportion of events, such as averagelength of word, or Sentences, common words, mayhave the highestreliability. number within an essay,have which are alsofound in a fair length (.63). And a fairlystable reliabilityfor average paragraphs, which areless frequentin an essay than sen- which'is somewhat tences, have afrequencyreliability writing of atitle, lower (.42). On the otherhand, the only once in which is a behavioraldecision which occurs non-existent the writing ofeach essay, has apractically is still very reliability. This is ageneralization which tentative, anddeserving of moreexploration. speculation, isthat A thirdgeneralization, really a the there may be asignificantinterrelationship among for reliability of the prox,the betaweight of the prox of the proxfor a particularessay, andthe worthiness

-61- assessing the more stable writingbehavior of the student. It is worth studying to find outwhether prediction of future essays may be improvedby modifying the betaweights in accordance with thereliability of the proxes. This possibility has not yet beenanalyzed within thisproject, but is promising for thefuture, for practicalprediction purposes.

A final interestingspeculation concerns therelation- ship between the reliabilitiesof the proxes and the re- liability of the totalmultiple-regression equation. It is a familiarobservation in mental testing that thetotal score of a test,which is often a sum ofvarious part scores, willfrequently be more reliablethan any of those part scores takenseparately. But it might not be so obvious that the samephenomenon may occur inmultiple regression, that the totalpredictive validity may con- ceivably be higher than thereliability of any of the con- tributing predictors. This appears to be the casehere; but the mathematical aspectsof this problem will notbe analyzed within the scope ofthis report.

Human and machinejudgments. Now it would bevaluable to return to afurther analysis ofTable IV-1, since it has much to tell usabout rater performance. Earlier in this chapter, it was notedthat the upper leftquadrant (for Columns 1-4) shows usthe intercorrelations among the judge columns for EssayC. Columns 5 through 10show the increased accuracy, orreliability, which may come from increasing the numberof judges.

In this portion ofthe table, many ofthe coeffi- cients are of course inflatedartificially through a part- whole agreement. Columns 1 and 5, forexample, agree at of a level of.88, but since Column5 is simply the sum Column 1 and Column 2,this has littleempirical meaning.

-62- Whether a coefficient is so contaminated may be at once determined by reading the variable names in the leftmost column of the table.

Column 11 represents the sum of all C ratings, and therefore the agreement coefficients between 11 andall of the earlier columns are similarly a part-wholeartifact. On the other hand, Columns 11, 12,and 13 do have consider- able significancewhen properly understood. Column 12 represents the sum of all D ratings, and istherefore the best measure of external validity which one couldwish for the various ratings given by the human judges tothe C essays. And Column 13 represents the machine evaluation, derived from multiple-regression analysis of the proxes for Essay D. This information has particular meaning for this project, as is here explained.

Human vs. machine "validity". An interesting side- light is cast upon the human vs. machine bylooking at some analyses of the humanjudges of essay C compared with human and machine judgments of essay D. The most important meaning of "validity", for an essaytest, would appear to be howwell it predicts performance on another essay by the samestudent writers. That is, in the long run we are lessinterested in how reliable this particular judgment of performance is, and moreinterested in how well it assesses the student's generalwriting performance, under somewhat differing circumstances. One important measure of this validity,then, would be agreement of ratings with those of other essays by the samestudents.

We would always expect suchvalidities to be lower than the agreement between raters onthe same essays, for not only would the ratings differbecause of rater error (or viewpoint), but they would differalso because of intrinsic differences in performance ofthe student under two sets of conditions. One interesting comparisonof the machine and the human judge, then, wouldbe to match each with the ratings of the expert groupfor some second essay.

-63- This comparison was simplified for thepresent study because, as we have seen, multiple essays wereanalyzed for the same student writers, the"Free" essays and the "Anger" essays. We have seen how the individualjudges (or their statistical summations)agreed with each other in Table IV-1, on the C essays. Now it would be instruc- tive to see how well each of those"individual" judges predicted the ratings on the D essays. For this compari- son they arecorrelated with those ratings given bythe group of four raters onthe "Free" essays, so thattheir coefficients each represent thecorrelation of an indivi- dual with a group of individuals. And we may therefore expect the correlations to behigher than those between pairs of individuals, since someof the error (but by no means all) willbe eliminated by the largernumber involved in the group sums. The coefficients are alsosomewhat higher then they should be, -in one sense,since some of the same judges were involved in evaluatingboth C and D essays.

When such comparisons aremade, the four judges of C are found to correlatewith the pooled judgment of D as follows: .56, .59, .60, and .42. These coefficients produce an average of about .54between these two essays.

On the other hand, wecould look at thepredictions of the D essays generated by proxanalysis of these same essays, resulting fromthe multiple regression programs. These predictions become a(reasonably) independent way of estimating how well thestudent might do onanother essay. When these machine predictionsof D, then, are used to predict the students'actual performance onthe C essays (that is, thepooled expert judgmentof such per- formance), the coefficient is.53. This coefficient is almost precisely that of thetypical human judge,and once again shows ushow similar to thehuman individual is this first approximationof a machine system.

-64- This finding also furnishes another response to the critic who supposes that the measures found through such statistical procedures are entirely artifactual, andwill disappear upon validation. There could hardly be any measure of validity of essayrating superior to this one, and on this measure the machine performs, even inthis early state, as well as the expert human.

Human vs. machine accuracy. There are two elements of "accuracy" of rating which are submerged in thedata, and for which there is no readily available statistic in common use. Both of these hidden statistics are extremely important and, as it happens, both would argueadditional advantages on the side of machine grading, in almost any practical situation.

When students speak of "fairness" in grading,they ordinarily are not speaking primarily of anycorrelation with some true score, as much as they arespeaking of absolute comparisons with such a true score. As we know, the common correlational methods suppress both mean scores and score variances, in order to make thecomparison on standard scores alone. Therefore it would be possible to have two human raters "agree" perfectly,in terms of corre- lation coefficient, in that r might equal 1.00. Yet they might have not one rating in common. This would be the case if two teachersassigned the identical rank orders to a set of students, yet oneassigned marks just one grade lower than the other teacher. To the typical student, such a question of "hard" or "easy"marking would be much more itportant than minordifferences in correlation.

Another aspect of accuracy or"fairness" to the student is the range of marks assigned. The student at the bottom is very concerned whetherthe teacher is one" who fails students often. And the student at the topfeels

-65- that it is "unfair" if there is not a reasonable probability of his getting an A. If an "accurate" or "fair" grade is regarded as that which would be assigned by a group of ex- perts, acting independently of one another, then these questions of mean grade, and of grade dispersion, are very important to such accuracy.

In this way, of course, the machine can beincomparably superior to the human. Both mean grade ahd grade deviation may be determined entirely on thebasis of the expert group, and if desired remain fixed for any groupof students for whom the system is applied, regardless of thesize of the group. We take such standards for granted with the national standard scores of such instruments as the Scholas- tic Aptitude Tests, yet ordinarily we despair oftrying to achieve the same fairness with any marks administered on a local level. With the introduction of machine essay grading, it appears likely that the parameters of evalua- tion may be uniformly adjusted to any standardsfound appropriate. In the question of accuracy, then, as thisquality is ordinarily thought of, the automatic system has somelarge advantages over the human system, and these areadvantages which cannot be easily demonstrated in statisticalcompari- sons. But they should be kept in mind for anythinking about applications. Usinataiees.sax'si=oxes for another essay'scriterion. One way to find out what proxesmight have the greatest stability, in terms of measuring important aspectsof a student's characteristic writing behaviors,might be to use the proxes from one essayin a multiple regression for another essay by the same students. The reasoning may be obvious: There are certain aspects of studentbehavior which might influence the human judgmentof one essay, but

-66- not be much related to the student's longrange performance. If we cross the proxes and criterion, then, in theway described, we may be tappingmore enduring aspects of writer behavior.

To investigate this question, theproxes from Essay D were used in a new multiple-regression analysis to predict the pooled human judgments for overall quality for Essay C. The resulting MULTR was .62, whichas would be expected was a considerable drop from the .71 obtained with Essay C's own proxes. Table IV-12 shows the summary data of interest from this analysis. Column B represents the cor- relation of the proxes with the criterion, and Column C represents the beta weighting of each prox in the analysis.

Inspection of these columns, and a comparison of them with their counterparts in Table IV-7 and Table IV-3, do not provide any very transparent explanation for the decrement in prediction. A hint may be gained from the slightly lower correlation of essay length with the cri- terion; students may have more to say on one subject than on another, and this fluency may affect the rater's judg- ment. And the beta weight for essay length has also dropped markedly (from .32 for Essay D's own criterion, to .21 for Essay C's criterion), which bolsters this suggestion. A comparison of another contributor, standard deviation of word length, shows a similar decrement in beta weight, but an actual increase in the bivariate correlation with the criterion, compared with the Essay D table (IV-3).

In summary of this trial, then, the data are difficult to interpret verbally, but.seem to argue that the.decrement in multiple correlation may be a reflection of the true difference in student performance across essay topics.

-67- TABLE IV-12

ESSAY D PROXES USED TO PREDICT AN ESSAY C CRITERION

A. B. C. Proxes Corr. with Beta wts. Criterion

1. Title present .03 .07 2. Av. sentence length -.01 -.22 3. Number of paragraphs .08 .02 4. Subject-verb openings -.14 .02 5. Length of essay in words .19 .15

6. Number of parentheses .11 .05 7. Number of apostrophes -.19 -.05 8. Number of commas .37 .18 9. Number of periods .00 -.09 10. Humber of underlined words -.03 -.07

11. Number of dashes .22 .06 12.No. colons .08 .04 13. No. semicolons .04 -.00 14. No. quotation marks .17 .07 15.No. exclamation marks -.07 -.05

16. No. question marks -.08 -.11 17. No. prepositions .17 .02 18. No. connective words .17 .02 19. No. spelling errors -.09 -.02 20. No. relative pronouns .04 .06

21. No. subardinating.conjs. -,13 .04 22. NO. common words on Dale .1.44 -.09 23. No. sents. end punc. pres. .01 .04 24. No. declar. sents. type A .08 .01 25. No. declar. sents. type B .06 .02

26. No. hyphens .24 .14 27. No. slashes -.01 .01 28, Aver, word length in ltrs. .45 .10 29. Stan. dev. of word length .48 .21 30. Stan. dev. of sent. length 4..04 .12

-68- Cross-validation with same essays. As we have already suggested, the question of validity is acomplicated one. One acceptable form of validity issurely the prediction of future behavior by the same student. But what is often meant is rather the prediction ofwhat expert humans might say about a student'sperformance. This would become a kind of "concurrent" validity. In the context of the presentproject, such concurrent validity would consist of seeing towhat degree the.machine scorings of the essays would coincidewith the human scor- ings. In one sense, we already knowthe result to be .71, since this was the discoveredmultiple correlation for both C and D essays, and representsjust what is described: a measure ofcorrelation between the machine scoresand the human scores. As we have seen, however,such a coeffi- cient capitalizes upon chance,and should be shrunken statistically. This has also been done, with aresultant shrunken (Wherry) coefficient of.67.

Even the shrunken coefficientis not completely satis- fying, however, because of the factthat empirical data often deviate from the assumptidns uponwhich such statis- tical manipulation is based. Besides, it is desirable to know how the machine algorithmwill correlate with the individual judges.

For these reasons,it is most desirable toselect randomly among the.essays,and generate theweightings from this sub-sample, after whichthe weightings may be used to assign scores tothose essa s not included inthe multi- ple-regression analysis. The correlation of thesemachine scores, may becorrelated with the humanratings of these excluded essays, and this newcorrelation will represent a very appropriate measureof validity. The result of such a procedure is exhibited inTable IV-13. TABLE IV-13

CROSS-VALIDATION COMPARISONOF

THE CCMPUTa, WITH FOURHUMAN JUDGES

(Essay N =138)

Judges

A 51 51 44 57 56 61 B 51 53 48 49 c 51 53 D 44 56 48 59 E 57 61 49 59

correlation Note:Judge C is the computer. All cells represent judge columns' coefficients generatedby comparing four human The machine scores were with machine scores onthe same essays. written by other students, those generated frail138 other essays

chosen at random fromthe smme largersample.

-70- For this table, the computer-assigned scores, then, were generated from an analysis on 138 of the D essays, which were chosen by random methods from the 276 total sample. The weightings derived from the analysis were then applied to essays written by 138 other students, and the scores so assigned were correlated with the scores assigned by four human judges (Page, 1967a).

This Table IV-13 has often been presented to audiences as the clearest and simplest evidence of the effectiveness of machine strategies, and it has usually been presented without telling the audience which column in fact repre- sents the computer. It is very difficult to guess which one it would be, yet given sufficient time, the occasional sophisticated psychometrician may be able to reason out that Column C is the most probable, and is indeed the computer column. The reason why this is detectable is again characteristic of the difference between man and machine. Surely the machine is not measuring the essay quality in the same way as the man. The machine is surely failing to attend to many of the important syntactic and semantic properties which influence the human judge. But the machine is in one sense more reliable than the human judge, and it is the reliability which gives it away: The coeficients for the machine (Column C) range from only .48 to .53, whereas the coefficients for the human judges (Columns A, B, D, and E) have ranges which are typically three times as large. The machine agrees with the human judges more consistently, then, than they agree 14ith each others

Practical implications. Although striking, Table IV- 13 does not merely represent a simple trick. Rather, it is the clearest analog so far to what might come from a large-scale, machine-based essayevaluation. For example, in a national essay quiz (such as the writing sample occasionally taken by the College Entrance Examina-

-71 tion Board), the procedures would probably be quite similar to those which led to this table. In prior years, a number of essay topics would each be assigned to a fairly small test sample of students. Their writings would be inten- sively analyzed by expert humans for whatever research and norming purposes might be desired. Then from this pool of tested easay assignments, one stimulus would be chosen to be used across the country on the day of the major test. When collected, these essays would typically not be exa- mined by human judges at all, but would rather be analyzed by computer programs already developed from the test sample. Only a few would subsequently be analyzed by human beings, to check for possible drift in sample, or for historical developments which might have altered the.essay topic. In general, the crash scoring, in this hypothetical testing, would typically be entirely mechanical.

Summary. Earlier chapters introduced the rationale, basic design, and initial proxes used in this study, and have presented the computer program used in their measure- ment. The present chapter has presented some of the find- ings from the study bearing on the basic questions of the agreement of the human judges with each other, and with machine scores of the same and of different essays'. It has furthermore presented information about the proxes: their intercorrelation, their prediction of human judgments, and their reliability across trials. In most important comparisons, the machine scores were found to be practi- cally interchangeable with the human-ratings. This find- ing was most important when various types of validity were analyzed, one.based upon prediction of studentpeiformance on another occasion, and the otherapplying measures generated from one set of students to a whollydifferent set of students. Some comment was also made about infer- ences from these findings forpractical work in the future.

-72- CHAPTER V

PREDICTING A PROFILE OF RATINGS

The last chapter explained in considerable detail the results from the attempts to predict the overall. rating of some sets of essays. It was seen that the simu- lation was indeed very successful, and that the level of success had a number of implications for thefuture of such work. This chapter considers the more advanced case of simulating an analytic profile of an essay.

If a single overall rating were the only outcome from analysis, it would be a satisfactory substitute for some major tasks today, such asnational essay exams (where a single pooled judgment is the usual product of evaluation), or many classroom situations (in which an overworked teacher marks only a letter grade and some redundant comment on a returned essay).Nevertheless,any substantial essay analysis must seek a level of performance nearer to that of the ideal teacher: with a much richer profile of the traits of the writing, so that students (and their instructors) may concentrate differentially on relatively weak skills in theprofile; and with more detailed and direct comment about specific patterns or errors in the student's work. This chapter will concen- trate on the trait ratings of the essays, andChapter VIII will give some attention to the detailed andpersonal comment to the student.

The sample. For the reasons set forthin an earlier chapter, there was no cause for dissatisfactionwith the Wisconsin essays (McColly and Remstad, 1963). They did not represent a typical high schoolstudent body, but the early strategies, no impor- . range was wide, and, with such

-73- tantinteractions with selections were expected. Further- more, a number of replications had beensuccessfully per- formed with other essays, with second essays written by the same students, and with a set of essays written by students in Indiana in an unrelated study conducted by Anthony Tovatt and his colleagues at Ball State University under the U. S. Office of Education (and reported else- where by Dr. Tovatt). For the phase of work considered in this chapter, therefore, it was decided to continue the analysis, this time working most intensively with Essay C (those based upon the question of whether "the best things in life" were really "free").

The new data needed were judgmental, then, for no evaluative data existed for the Wisconsin essays beyond the simple rating of-overall quality. We therefore wished to establish a reliable set of ratingswhich would consti- tute a sensible descriptive profile of thestrengths and weaknesses most commonly looked for in stylisticjudgment. For such a requirement we would need: (1) a set of established and accepted dimensions; (2) a selection of judges who would be representative of qualifiedEnglish teachers in general; (3) a sufficient numberof judg- ments to overcome the inevitable haloeffects, and to esta- blish in truth a meaningful profile.

Just as with the essays, theinvestigators could afford to be reasonably relaxed about anyrandomness of selection, so long as judges met the general,personal and professional criteria, because stratificationof region, type of school, and a myriad otherpossible considerations seemed unimportant so far as theseparticular generaliza- tions of result are concerned. While there are,differ- ences among teachersin such dimensions, interaction of such dimensions with the purposes of thestudy seemed of negligible importance. And what seemed of much greater importance was the control of the ratingsituation.

-74- The rating session.On July 16, 1966, then, underthe principal supervision of Dr. ArthurDaigon, 32 English teachers met at the School of Educationof the University of Connecticut for the purposeof grading student composi- tions for multiple traits. Because this group's judgments of student writing were to representthe evaluations of highly competent professionals,evaluations which would subsequently be simulated by a computer,selection of participants (setting randomness aside) wasdone with considerable care.

Ten chairmen of Englishdepartments in Connecticut secondary schools were invited toparticipate with teachers whom they could recommend ashaving special competencein the grading of student compositions,and who had at least three years of teaching experience. The department chair- men were alsorequested to give first preferenceto those skilled teachers who possessedmaster's degrees.

Of the 32 teachers whoparticipated, 10 (31 %) were department chairman and 28(87 %) possessed M.A. degrees. The mean number of years ofEnglish teaching experience was 12.9 years,the median, 10 years. Before the grading session began,the teachers were welcomed and acquainted withtheir task. Each would grade 64 compositions, assigning separategrades on a 5 point scale for each of 5 traitsdesignated as "ideas orcontent", "organization", "style", "mechanics",and "creativity". Each English teacher-judgereceived both written andoral instructions relating to identificationand scoring of the traits. Samples of a "good" compositionand of a "poor" composition were distributed andconsidered in order to demonstrate how the traits couldbe scored and to suggest a range ofpossible response. -75- Eight 30-minute time periods wereestablished. During each period each judge evaluated 8compositions,which allowed about 3-1/2 minutes for themultiple trait judging of each composition. These arrangements permitted 8judg- ments for each of 5 traits ineach of the total of 256 compositions. The assignment of essay to teacherand period was a formidable task, which required the computer. The problem was manifold: each essay could be assigned only oncein each of the eight periods, so thatthere would be no important period effect hurting the essayevaluations, and so that wewould not need multiple copies of the essays. Yet no essay could be given to anyjudge more than once. And it was also desirable torandomize the order of pre- sentation within a period. These problems are rather easily solved if groups of eight essays arekept together, but such a procedure wouldobviously distort the evalua- tion of an essay in unknown ways. There is another major problem, in that it is easy to continuerandom assignment up to the lastperiod, and find unresolvableconflicts of assignment, requiring branching back.to someearlier point. But eventually, with intensivework, the problem was solved and made completelyautomatic. The mechanics of the ratingday were not a trivial problem, however, since we neededsix graduate assistants performing the reassignments. Luckily, though, they could use punched andinterpreted assignment cards, aby-product of the computerized assignment program,which also served as mark-senserating records for lateranalysis.

The rating criteria. In choosing the traitsfor rating, we desiredwell-established dimensions ofwriting quality. One of the most helpfuldocuments was an eight- scale evaluation designed by PaulDiederich, and used at the Educational Testing Service("Definitions of Ratings

-76-- on the ETS Composition Scale" -- no date). Figure V-1 shows our adaptation of such suggestions.

Naturally, there are large differences among raters in their evaluations of the same essays. Since each rater read 64 of the essays, and since this number repre- sents one fourth of the 256 used in the design, theproba- bility of any one essay being read by any judge is clearly 1/4, and we might expect that, of the 64 essays read by judge A, about 1/4, or about 16,would be read by judge B. Clearly, with such small N's, we would not expect very secure estimates of the population agreementbetween two judges, but would rather expect a large random error.

Table V-1 shows the intercorrelations among all 32 judges. The correlations are'based upon the "total" scores, which were the average ofthe five trait ratings given by any judge to an essay. As can be seen, the median judge intercorrelation hovers around .5 for these total scores. The judges were instructed, as is clear in FigureV-1, to balance their ratings into a certaindistribution, approximately normal, and it would be expected thattheir.- means and standard deviationswould therefore be approxi- mately equal. Table V-2 shows that this is indeed the case. Since 5 represents the best rating, and 1 represents the worst, the means are all seen to deviatefrom the ex- pected 2.5 in a slightly generous direction. The nebulous "ideas" or "content" is the most tolerantly graded,with "organization" a second place. "Mechanics" has a middle position of severity of Marking, and has decidedlythe largest standard deviation of any trait. Teachers were thus more decisive about mechanics and, as weshall see later, they agreed more with each other aboutmechanics than they did about the other dimensions of essayquality.

-77- FIGURE, V-1

CRITERIA FOR RATING THE ESSAYS

I. Definitions of the basic traits to be rated.

A. Ideas or Content. The quantity and quality of the materials used to cover the subject. B. Amnia:elm. The relationship between the parts of the paper and the whole. C. ALVA. The use of language above and beyond the problems of mechanics. D.Hechanics. Spelling, grammar, usage, punctuation, capitalizatica, numbers. E. Creativity,. The degree to which the paper finds a new, unexpected, yyt fruitfdl way to approach the subject, to combine ideas, and to utilize language. An over-all trait.

II. Guides for rating the basic traits.

A. IDEAS OR CONTENT WA. The student covers the materials that the topic and plan of attack clearly call for. His understanding of the subject is good and he uses clear definitions.He has the ability to see the topic in a broader perspective than do the other students in his group, that is, he brings a broader experience to the topic.

Middle. The ideas are appropriate, but conventional and few in number. Some aspects of the topic are left out.The writer does not seem to have a well-stocked mind.

The student omits many important aspects of thetopic.' He seams-to have no store of knowledge to bring to bear on the topic and consequently repeats afew simple ideas over and aver again.

B. ORGANIZATIGI Wk. The student has a definite plan for discussing the aseigned topic..If he is arguinglor or against an idea, he presegts relevant reasons in an effective order. If he is describing something, he does so according to acme scheme (top to bottom, order of portance, order of complexitys.etc1) If the student is explaining a concept or process he twee a coherentplan of analysis, or definition, or illustration.The student has a good sena* of what is relevant to his planand avoids repetition. He &haws a sense of.proportion in treating the various parts of his essay. FIGURE V-1. (cont.)

Middle.The student shifts hisplan of discussion, orintroduces irrelevant material, or spendstoo much time on unimpor- tant things, or repeatshimself.He develops the assign- ed topic by freeassociation (what comes to mr mindwhen I think of Hawaii?) ratherthan by working toward a definite purpose.

Low. The student does not seem tohave given any thought to what he intended to saybefore he started to write. He offers no plan of discussion.The paper seems to.start in one direction, thenanother, then another, untilthe reader is lost. The main points are notassay separated from one another, and they comein a random order.

C. STYLE

(There are many aspects of style that mayenter into a rating-- individuality, vividness, elegance,etc. However, for the purposes of this experiment we areinterested in three stylistictraits onlyclarity, variation, and rangeof linguisticresources.)

comprehen- high. The student uses languagein a way that makes sion of the paper easy.He uses appropriate wordsin their normal sense. He puts the words intheir normal order. He is careful to signalhis transitions. He avoids ambiguity and he doesnot frustrate the reader's expectations. At the same time thestudent avoids monotonous repetitions ofwords, phrases, and sentence of a good structures. Finally, he reveals a command range of linguisticresources. His vocabulary is good, of he uses parallelstructures, he makes subtle use subordination, and so on. by Middle. The student occasionallybrings the reader up Short by choosing a bizarre,inappropriate word or phrase, or introducing a distractingmetaphor, or by misplacing a modifying phrase or clause, orby making unexpected phrases, and transitions. The repetitions of wOrdsp sentence structures becomemonotonous. The resources of language are limited. The writer is addictedto tired old phrases and hackneyedexpressions. Awkward con- Vague use of words.Ambiguous references. sentence structure. structions.Childish vocabulary and

-79-

W../2120219415KOMMK41761, FIGURE V-1 (cont.)

D. MECHANICS migh. The sentence structure is usually correct, evenin varied and complicated sentence patterns.NO violation of established spelling rules. E*en the hard words are usually spelled correctly.No serious violatiOns of the rules of punctuation, capitals, abbreviations,and numbiors.

Middle. An occasional syntax problem.Hard words are occasionally mispelled.Some violations of the rules concerning punctuation, etc.

Lori, The student borders on the illiterate.

E. CREATIVITY

Ash. The student surprises us with a new andfruitful way of looking at the problem.He brings to bear nev data in treating the topic. He finds a freah and interesting way of using language that illuminates hisideas.

Middle. The student thinks of the expectedthings. He treats them in a way that most people wouldtreat them. He makes use of ordinary exprossions and sentencestructures.

Low. The student works with cliches of thoughtand expression. Does not go beyond the mostsuperficial treatment of the subject. Repeats formulas without really graspingtheir meaning.

Try for the following overallbalance RATING 5 TOP 7% or so.About 2 of each 16 essays. 4 .NEIT TOP 25%.About 4 of each 16 essays. About 6 of each 16 essays. 3 ...MIDDI; 35%. 2 Next BCTTOM 25%.About 4 of each 16. 1 ,BOTTOM 7% or so. About 2 of each 16.

-80- INTERCORRELATION MATRIX FOR 32 RATERS FOR TOTAL SCORESTABLE V 1 8 1 * .33 2 3 .33 4 .65 5 .75 6 .43 .567 8 .05 9 .68 10 .32 11 .20 12 .88 13 .78 .60 14 151 16 .55 .71 .3617 .75 .26 .49 .81 .75 .84 .42.ft02 .26 .5118 19 20 21 22 23 24 251 26 27 28 29 .66 .29 30.45 31 32 2543 * 54.50 * .36.62 * .79.15.76 .24.64.59.43 .74.20.42.59 416 .23.46.48 .75.65.39.24 .70.88.39 .28.38.81.43 .56.59.26.64 .53.26.74.58 .20.83.52.66 .40 .76.49.19 *50.27 .55.81%08 .39 .57.36 .48.40 .36.60 .31.57..06 .65 .68.35 :67.06 .40.38 .39.48.04 .74 .49.82 .13.55 .27 .38 0113 .43.34 .04.70 .89.48 .33.41 .56.35 .41.64.00 .69 .61.72 .37.44.09 .65 .26.51 .35.47 .68.53 .31 .49 .71 .00 .43 .42 .80 .57 6 .45 .20 52 * 151.36.45 .70.22.49 .52.58.24 .26.73.03 .62.66.81 .3243.02.23 .25 .66 .42.22 .08 .71.41 .30 .21.46 .62 .66.32 .65 .53.20 .05 .16.36 .56 .50.39 .40 .61 .22.51 .62 .25.117 .66.11 .47 .64.18 .49 .55.18 46 .85 .28.53 457 .84 .79 .53 .47 .49 64 .so .38 .33 .67 .51 1011 9 .47 .56.22 * .01.21.78 .32.73.52 .27.10.44 .70.25.51 .34.23.36 .54.65.23 .15.51.48 060.79.26 042.79.14 .95.50.15 .67.55 .49.59 .49.42 .65.48 .51.14 .09.67,04 .56 .78 .07 .54 .55 .77 .28 .32 .77 .52 .28 .31 .26 .76 .06 .21 .58 .16 14131215 * %*.63 .62.42 .60.51 .62*.37 .77 .52.66 .87 .67.58 .46 .71.89 .12 .35 .80.07 .35 .38 .22 .86 .40 .82 .71 .78 .21 .59 .75 .06 .28 .39 .55 .72 .11 .43 .77 .51 .64 .72 .42 .49.16 .43 .07 .89 .74 02 .72 .06 .67 .64 .63 .18 .10 ..22 .05 .09 .04 .17.01 .45.01.60 .27 .72 .34 .45 .45 .39 .25 .08 .44 .52 1620191817 * .64 .56.03 .07 .15 .03 .71...09 .49 .58* .34 .44 .36 .19 .27 .08 .30 .52 *.44 .36 .69 .65 .68 .50 .59 .52 .64.44 .49 .61 *.15 .33 .72 .27 .82 .46 .28 .66 .1903 .29.64 .38 .57 .46 .62 .73 .52 .54 .72 .40 .15 .57 .53 .26.42 .42.80 .44.75 .71 .38 .25 .27 .51 .62 .32 .06 .54 .18 23222124 * .26 .08 *.25 .38 .06 .56 .22 .33 .26 .59 .69.09 .74 .30* .79....43 .63 .66 .45 .38 .88.40 .76 *.73 .17 .30 .47 .14 .08 .21 .64 .52 .56 .69 .64 .01 .03 .65 .67 .23 .09 .69 .37 .69 .79 252621728 * * .56 .55 .20* .61 .05 .55.51 .72.41 *.44.35 .56.34 29323130 .70 .33 .34 .33 .ce * .28 TABLE V-2

MEANS AND STANDARD DEVIATIONSOF THE

FIVE TRAITS AND THEIR AVEaAGE

Standard Trait Mean Deviation

.640 1. Ideas 3.068

.675 2. Organisation 2.950

.619 3. Style 2.827

.771 4. Mechanics 2.869

.641 5. Creativity 2.833

.610 6, Trait average 2.909 Contribution of the proxes. It is of interest to see how each prox contributed to the prediction of the five various traits,and this information is contained in the next five tables (V-3 to V-7). The information for the average of the five traits is containedin Tabie V-8. The first cLlumn has the number of the proxes and, in the first of theck tables (V-3) the name of the prox as well. Column B has the correlation with the criterion for each prox. Column C has the B-weights for each prox. And Column D has the computed t-values for each prox.

Column C, which has the B-weights for each prox, should not be confused with the Beta weights given, forexample, in Table IV-3 of the last chapter. The B weight is the coefficient which is actually used, together with the raw prox score, to optimize thepredictive value of any prox in an applied situation. In other words, given two proxes of the same Beta coefficients, the one having alarger standard deviation will have a smaller B-weight. While these B-weights may not be compared directly with those Beta co- efficients given in the last chapter, they may becompared with the corresponding B-weights of the othertraits given in this chapter, though any such comparisonwould be simply monotonic, and differences could not be easilycompared be- tween proxes. The relative contribution of each prox maybe inferred from the t-values in Column D. The absolute values of these t's are monotonically related tothe rank order of contribution of the proxes to the prediction. For example, Table V-3 shows that the highest contribution wasmade by fifth prox, showing a t-value of 6.38, farahead of any other. When it is considered that Prox #5is "length of essay," and that the trin is "ideas orcontent," we are struck by the obviousness of the relation. The more words used, the more content the essay isbelieved to have. For

-83- TABLE V-3

PROX CONTAIBUTION TO THEPREDICTION OF IDEAS OR CONTENT (N ar 256)

A. B. C. D. Proxes Corr. with B wts. Criterion

1.37 1.Title present .01 .14973 2. Av. sentence length -.07 -.03357 .1.79 3. Number of paragraphs -.00326 -1.38 4. Subject-verb openings -.19 .00205 6.38 5. Length of essay in words .37 -1.23 6. Number of parentheses -.04 -.00875 -.00640 4.73 7. Number of apostrophes .00601 3.73 8. Number of commas .37 -0.01 Number of periods .03 -.00007 9. -1.14 10.Number of underlined words -.04 -.01053

U. Number of dashes .36. .03283 4.63 .02032 1.03 12.No. colons .07 .00964 0.90 13. No. .sanicolons .13 .15 .00015 1.25 14. No. quotation. marks 4.26 15.No. exclamation marks -.05 4...00272 -.00121 -0.56 16.No. question marks .G8 .02926 1.54 17. No, propositions 0.12 18. No. connective words 40797 -.05672 .4.55 19. No. spelling errors .0006 1.09 20. No. relative pronouns -.07 :02955 0.88 21.No. subordinating conjs. -.00270 -0.22 22. No. common words on Dale -.34 .03073 1.33 23. No. sents. end pumc. pres. .17 -.01298 -43.60 24. Nb; declar. Bents. type A -.00 -.01483 -0.52 25. No..declar. mints. type B .04 .00717 0.99 26. No. hyphens .43 .03393 0.66 27. No. asshes .05 -.00094 -0.34 28.Aver, word length in ltrs. 034 .00921 3.17 29. Stan. dev. of word length .43 .00200 1.38 30. Stan. dev. of sent.length

Intercept constant -1.01123 liatiple correlation 0.72301 Std. error of estimate 0.47093 ymultr 8.24

- 84- TABLE V-4

?ROI: CONTRIBUTION TO THE PREDICTION OF ORGANIZATION (N =256)

A. B. C. D. Proxes Corr. with Buts. F=value Criterion

1. .01 .10786 0.82 2. -.11 -.00229 -0.96 3. do -.01289 .4457 4. -.20 -.00343 -1.21 5. .23. .00124 3.20

6. -.08 -.01417 -1.65 7. -ow -.00253 -0.57 8. .31 .00515 2.65 9. ow o00399 0.53 10. -.06 -.01267 -1.14

U. .26 .02136 2.50 12. .07 .02091 0.88 13. .10 .00449 005 14. .18 .00035 2.44 15. -.03 -.00287 -1.11

16. ow -.00199 -0.78 17. .12 .01855 0.81 18. .13 .05610 0.72 19. -.16 -.24534 -1.98 20. -.07 .03517 0.91

21. -.10 .01539 0.38 22. -.33 -o00580 -0o39 23. .17 .04423 1.59 24. -xi .02411 -0.95 25. .06 -.02201 -0464

26, .20 .00527 0.60 27. .02 .01598 0.16 28. .33 .00082 0.25 29. .38 .00695 14.99 30. .05 .00118 0.68

Intercept Constant 4.29180 Multiple correlation 0.61455 Std. error of estimate 0.56709 F =4.55 -85- TABLE V-5

PROX CCNTRIBUTICN TOTHE PRMICTION OF STILE (N 256)

D. A. B. C. Praxes Corr. with B wte. F-value Criterion 1. -.02 .10535 1.01 2. -.12 -.00385 -2.04 3. .07 -.03715 -2.08 -.17 .00067 0.30 4. 3.99 5. .23 .00123 6. -.02 -.00505 -0.74 7. .07 ..00254 4.72 a. .41 .00624 4.05 9. .08 4016 0.53 10. -.0 -.01147 -1.30 .32 .02689 3.97 U. -0.13 12. .06 -.00236 .13 .00880 0.86 13. 1.64 14. .16 .00019 15. -.al -.00329 -1.60 -1.35 16. .11 -,0027116 .19 .0214277 2.211 17. 1.82 18. .15 .11271 -.15 -.20081 -2.04 19. 1.66 20. -.06 .05119 1.41 21. -.09. .04533 -.00476 -0.40 22. -.39 2.22 23. .18 .04894 -.03366 -1.62 24. -.04 ..1.00 25. .05 ...02750 .01625 2.34 26. .32 0.70 27. .05 .05503 .40 .00044 4417 28. 3.26 .47 .00906 29. 2.28 30. .11 .00315 -1.33964 Intercept constant Multiple correlation 0.72951 0.45042 Std.error ofestimate F multr. at8.53 -86- PROX CONMIEUTION TOTHE PREDICTION OF MECHANICS (N szt 256)

A. B. C. D. Proms Corr. with B lets. Fsvalue Criterion 1. ..00 .11647 0.83 2. -.33 -.00258 -1.02 3. -.01 -.02655 -1.11 4. ..oe .00380 1.26 5. .06 .00053 1.29 6. -.02 -.00483 -0.53 7. -.07 -.00234 -0.49 8. .29 .00579 2.80 9. .16 .00684 0.85 10. -.06 -.00745 -0.63 U. .23 .02192 2.42 12. .12 .03514 1.39 13. .10 .01353 "0.99 14. .09 .00017 1.07 15. -.05 .00106 0.39 16. .04 .00288 1.05 17. .17 .06407 2.64. 18. .14 .17301 2.09 -.30 -.60565 -5.21 19. 0.41 20. -.08 ,,.01675 0.97 21. - -.06 .04182 22. -.28 .03763 2.36 23. .21 .00580 0.20 .07 .02471 0.89 24. 0.12 25. -.01 .00433 26. .25 .02133 2.30 .00 .10975 1.05 27. 1.54 28. .39 .00535 .42 .01128 3.04 29. 0.74 30. .00 .00136 Intercept constant -9.53911 Multipl correlation 0.67796 SW. error ofestimate 0.60314 F sultr. as6.38 -87- TABLE V-7

mca:CONTRIEUTION TO THE PREDICTION OF CREATIVITY 256)

A. B. C. D. Proxes Corr. with B wts. F-value Criterion

1. -.02 .13550 1.21 .0.19 2. -.12 .40037 -3.11 3. .09 -.05943 -0.54 4. -.17 -.00130 6.36 5. .34 .00208

6. -.08 -.01716 -2.36 .0.07 7. .00 -.00026 8. .39 .00648 3.94 2.64 9. .11 .01684 10. -.06 -.01797 -1.91

U. .29 .02680 3.70 12. .09 .01817 0.90 1.94 13. .16 .02121 1,18 14. .12 .00015 15. .00 -.00363 -1.65

16. .10 -.00295 -1.35 17. .13 .02480 1.28 0.33 18. .07 .02156 -0.86 19. -.11 -.09013 1.29 20. -.07 .04230 1.81 21. -.08 .06215 -1.06 22. -.32 -.01342 1.56 23. .15 .03678 -1.80 24. -.05 ..03993 -1.42 25. .04 -.04165

26. .30 .01846 2.49 0.74 27. .05 .06153 -1.16 28. .28 -.00322 2.77 29. .36 .00820 1.57 30. .10 .00232 Intercept constant 1.21571 Multiple correlation 0.70938 Std. error of estimate 0.48074 ir =Ur in 7.60 TABLE V-8

PROX CONTRIBTTICV TO ME PRIDICTION OF AV1RAGED RATING ACROSS 5 TRAITS (N = 256)

A. B. C. D. Praxes Corr. with B wts. F-value Criterion

1. -.00 .12299 1.18 2. -.13 -.00250 -1.33 3. .08 -.03392 -1.90 4. -.18 -.00071 -0.31 5. .26 .00143 4.65 6. -.05 -.00999 -1.47 7. -.07 -.00281 -0.80 a. .38 .00593 3.86 9. .10 .00615 1.03 10. -.06 -.01202 -1.37 .32 .02596 3.84 .09 .01844 0.98 .14 .01154 1.13 -.15 .00020 1.75 -.03 -.00229 -1.11 16. .09 -.00121 -0.59 17. .17 .03559 1.97 18. .13 .07427 1.20 19. -.19 -.25574 -2.61 20. -.08 .03609 1.17 21. -.09 .03885 1.23. 22. -.36 .00219 0.18 23. .20 .03330 1.51 24. -.01 -.01733 -0.83 25. .04 -.02033 -0.74

26. .28 .01370 1.98 27. .06 .05924 0.76 28. .38 .00031 0.12 29. .45 .00894 3.23 30. .08 .00200 1.45 Intercept constant -2.39336 Multiple correlation 0.72145 Std. error of estimate 0.44944 F zultr at 8.14 - 89- no other trait is essay length quite so dominant, though it plays an almost equal role for Table V-7, where "crea- tivity" is the trin. When we consider how often creati- vity is measured by tests of fluency and fecundity,we are again struck by the obviousness of the relation. Similar comparisons can be made for other traits and other proxes.

If we rank order the top five contributors for each trait separately, the results are interesting: For the trait of ideas, we find the proxes, according to the abso- lute value of Column D, to contribute in the following order:

1st) length of essay

2nii) frequency of dashes 3rd) frequency" of commas 4th) standard deviation of word length 5th) number of paragraphs

For the trait of organization, the order of contri- butory proxes is: 1st) length of essay 2nd) frequency of commas 3rd) frequency of dashes 4th-Y frequency of quotation marks 5th) standard deviation of word length

For style

1st) frequency of commas 2nd) length of essay 3rd) frequency of dashes 4th) standard deviation of word length 5th) frequency of hyphen

-90-

-,..41..1- Por mechanics:

1st) spelling errors 2nd) standard deviation of word length 3rd) proportion of prepositions 4th) freauency of commas 5th) frequency of dashes

For creativity:

1st) length of essay 2nd) frequency of commas 3rd) frequency of dashes 4th) number of paragraphs 5th) standard deviation of word length

And for the average of all fivetraits, calculated for each essay, the order is asfollows:

1st) length of essay 2nd) frequency of commas 3rd) frequency of dashes 4th) standard deviation of word length 5th) spelling errors

There is obvious noise in anycomparisons of such listings. In the first place, thereis random error, which is considerably higherin calculating Beta weights, or these similarmultivariate t-values, than incalculating bivariate relationships. In the second place,in the trins themselves there is a high degreeof halo effect, as will be shown soon. We would expect thefirst of these problems to be exhibited in ratherwild and unexplainedloadings, that would not necessarily bereplicated in cross-valida- tions. We would expect the secondproblem to be evident in the occurrence of some common proxesin all lists, as here we see word length to be animportant correlate of all traits, and commas to beanother.

-91- Nevertheless, there are ways in which thedifferences among these rankings areintuitively pleasing. Length of essay is of firstimportance for three traits, and second for a fourth. But on one list, essay lengthdoes not occur at all, and this one is thelist for mechanics. On the other hand, for mechanics we find theonly inclusion of spelling errors. The evaluation of mechanics isclearly a rather negative , in whichmistakes count against the student, and it seems better for astudent to be short and safe, than to be fluent.

In summary, these tables furnish manyinteresting com- parisons for differential study of thecontributions of the proxes to the central dimensions ofratings which were studied here. While there is considerable overlap ofthe important proxes, there is also somedifference in weighting which increases the accuracy ofprediction.

The uniqueness of the traits. A constant danger in multi-trait ratings is that they mayreflect little more than some general halo effect,and that the presumed dif- ferential traits will really not bemeaningful. This danger is one reason for havingeight judgments for each essay, since it was predicted thatthe halo would be extremely large. And the evidence we have already seen,showing the relative contribution ofthe proxes to the prediction of the traits, supportsthis suspicion of a largehalo effect. This halo is demonstrated inTable V-9, which shows the intercorrelations among thejudged traits of the essays, as rated by eightteachers for each essay. From this table it is clear that mechanics is themost maverick trait,hav- ing little to do with ideas,Organization, or creativity, but considerably more to dowith style. We find a very large halo, or tendency forratings to agree witheach

-92-- TABLE V-9

INTERCORRELATIONS OF THE TRAITS OF JUDGED ESSAYS (14=256)

5_ 6 TRAIT 1 2 3 4

.86 .68 .89 .93 1. Ideas .86

.82 .69 .82 .91 2. Organization .86

.83 .86 .95 3. Style .86 .82

.69 .83 .65 .85 4. Mechanics .68

.86 .65 .92 5. Creativity .89 .82

.95 .85 .92 6. All traits .93 .91 other. It may be noticed that,because of the interdepend- ence of theseratings, and their mode ofassignment by the judges, the intercorrelations here arein some cases actu- ally larger than the truereliability of the groupratings. Some reflection will showhow this could be: once a general level of rating is assigned,reliable or not, it will carry all the traits alongtogether with it. The variable "all traits" on Table V-9 refers tothe average of thefive traits for an essay, counted asequally important. Natur- ally, the large correlationsshown between "alltraits" and the individual traits areinflated by a part-wholerela- tionship. To test the uniquenessof the traits, James J. Roberge and others performedthe analysis of variance shown in Table V-10. There is of course ahuge variance between essays, and we alsofind a large variancebetween traits (explained by the meandifferences we saw inTable signi- V-2). What is important inthis Table V-10 is the that ficant trait-by-essayinteraction; which demonstrates there is a reasonablyreliable profile displayed,and that indeed there is some"validity" in the differentratings.

It would be possible,of course, to extractthe halo, and to work with theresidual, and unique,trait variance for various prediction purposes. We chose not todo this for two reasons: In the first place, wewould need con- siderably more raters for each essay,since the residual trait variance, after thehalo was subtracted,would be far less reliable thanthe original rating,and would make place, a much less secure'goal to simulate. In the second the and more importantly, we wereinterested in simulating real ratings actuallygiven for a certaintrait by real human judges withappropriate expertise. And when this is a primarygoal, then the halobehavior is anappropriate part of the simulationtarget, whether"pure" or not.

-94-

, TABLE V - 10

TRAIT BY ESSAYINTERACTION

df MS Source SS

Between judgments 8,230.305 2,047 14.868 6.002 Between essays 3,791.293 255 2.477 Error between 4,439.012 1,792

Within judgments 3,564.414 8,192' 21.053 56.412 Between traits 84.212 4 .789 2.115 Trait x essay 805.089 1,020

.373 Error within 2,675.113 7,168

Total 11,794.719 10,239

evaluation of July1966, during *This table isbased upon essay by eight differentjudges which each of256 essays was judged during eight differentperiods. Judge viewpoints. One effort withinthis project to improve the predictabilityof judgments wasundertaken by Herbert Garber and RobertShostak (1967), andreported at the Annual Meeting ofthe American PsychologicalAssocia- but by tion. Their work aimed atraising the multiple R, purifying the trin, ratherthan by optimizing the proxes. Much of this sectionis quoted or paraphrasedfrom their presentation. It is well knownthat one source of errorvariance in a correlationstudy is the unreliabilityof the criterion. Indeed, working underthe assumption of amultivariate normal population, it seemsreasonable that in anyrandom sample, errors ought tobe distributed withequal frequency in all variables,including the criterion(Hays, 1963, p. 573). The inspiration forthis particular approach came, in part, from workreported in Educationaland Psychologi- Naylor and Wherry, cal Measurement,volumes 26 and 27, by and also from anarticle by Jackson andMessick (1961). The first twoinvestigators used afactor-analytic approach They used to do what theycall "capture raterpolicy". The latter as subjectsAir Force supervisorypersonnel. pair described a similartechnique for use instudying which social perception ofpersonal status. One procedure is related to the oneof Garber andShostak is also re- ported by Christal(1963).

The departure in thepresent section wasfirst simply to find, from amongthe 32 reader-gradersin Project Essay with Grade, those clustersof readers whotended to agree one another nomatter what theirpolicy. In this case, their revealed agreementwould emerge fromfactor-analyzing judges, not essay grades. The next stepafter identifying their individual un- a "clear"cluster of judges was to use pooled grades as thecriterion in a newmultiple correla- tion compution to seeif the multiplecoefficient would -96- rise. If it would, then thedemonstration had worked as hoped. A set of judges had beenrevealed by the analysis who were to a larger extent"predictable" from a fairly mechanical, computer-executed countof what we have labelled "proxes".

What Garber and Shostakwanted was to have a factor analysis done on raters, andsince as few as five essays, in one case, had been readin common by a pair of raters, it seemed impractical toproceed to the computation of an intercorrelation matrix from adata matrix 3/4's empty. However, Dieter Paulusarranged to have the incomplete256 by 32 score matrix processed atCornell by Larry Wightman. The Cornell program insertedappropriate correlation values based on the varying numbersof observations availablefor each essay. On the average, 16 werepresent. Thus pre- pared, the Cornell computer nextprocessed the judge score intercorrelations and computed factormatrices both by the components and factor analysisprocedures. The eleven- factor matrix from thelatter computation producedabout three or four fairly clearclusters comprised of about as many individualjudges. By "clear" is meant apositive loading of over .65 on asingle factor and no otherposi- tive loadings greater than.43. Negative loadings were ignored. One cluster comprisedof four judges which metthe above criteria was selectedby inspection. For each of these judges, 64 essays andthe rating he gave toeach were gathered and then served asinput to the next stepin the project. A modification ofthe IBM ScientificSubroutine Program (SSP) for System360 on multipleregression was used at the University ofConnecticut to compute an over- all multiple R from thefour readers and theiressays' prox scores. The result was acoefficient of .65.At first blush this looks asif it were adisappointing out- come until oneis reminded that themultiple prediction

-97- is based in this instance on an unpooled, unweighted criterion so that any disagreement among the four readers and any dissimilarity in the proxes of the four more or less unique essay sets they read both contribute to a lowered overall reliability. For one must remember that among these four highly correlated judges who werefinally picked to be a test case of a refined criterion, most pairs had read no more than 1/4 of their essays in common. There- fore, to get some reasonable basis of comparison, four other judges were selected by a random procedure and their data were treated in exactly the same way. This time, a multi- ple coefficient of only .545 resulted. Comparing the re- spective coefficients of multiple determination, .420 and .297, we see that a bit over 12% more of the totalvariance in the criterion scores has been accounted for when using the selected judges. Putting it in terms of forecasting improvement over the "random four" judges we realize that using the technique here described we have a 40% improve- ment over "chance". Let us restate what was done by Garber andShostak. There was a fixed sample of high school essays. A multiple correlation coefficient of over .71 was computed when an averaged rating obtained from eight randomly-drawnreaders from a 32-reader pool was utilized as thecriteriomand 30 approximations to writing ability such as length of essay, number of commas, use of uncommon words, andstandard devia- tion of word length served as the predictors. Next, a factor analysis based on the unequal.numbers ofobservations for the 496 pairs of judges in the 32-judge pool wascalcu- lated. From this analysis one of several sets ofvariables (really judges),which were more or less clearlyidentifiable by the simple structure criterion,were selected as new criteria for a multiple correlation computation. However, this time the prediction would be much morestringent. There could be no approximation to a "true"grade for each

-98- score from essay inthe usual test-theorysense of a mean term wouldcontain repeated sampling. Instead, the error and to an unknownde- increased variancefrom two sources were, on onehand, from the gree. These error components mutually relative lack ofoverlap in theactual essays read higher number of by four judges, ascontrasted with the 256-essay "popula- "same" essays thateight judges from a was thelack of tion" had read. The other error source cancelling out random errors the beneficialeffects due to for each which occurredwhen eightratings were averaged essay. gained from To get anestimation as towhat had been comparison was madewith this method ofjudge selection, a the remaining a randomselection of fourother judges from found to be the 28. Nearly 40% morepredictability was multiple R's estimated gain. A samplingdistribution of crude sort ofsignifi- could have beengathered, and thus a for the Robtained on the canceapproximation calculated, untried clusters. How- selected clusterand on the other misgivings about anygenera- ever, onewould have serious samples from the same lizability of suchfindings to other population of essaywriters and graders. In the What are someimplications fromthis study? words of Garberand Shostak: by First, it hasbeen shownempirically that, specific smallgain may be this technique, one multiple made toward thegoal ofincreasing the grading by computerthrough selecting R in essay consistent of criteria onthe basis ofclusters of viewpoint among arandom sample ofreaders.

Second, by sometechnique likestepwise regres- for criteria, sion using suchidentified clusters knowledge may emergeabout the essayevaluation processitself.

-99- suggested in his And, third, asDavis (1965) critique of Project'Essay Grade, we mayby wiring stu- the route markedout here avoid dents to an"easy mass standard"writing style since,instead ofexposing merely asimplicity begin to in human writingbehavior, we can dissonance and raucous uncover somesources of rule the teaching rumblings amongthe lions who of effectivecomposition. not yet been The possibilityoutlined here has of highermultiple-R capitalized uponfor the production regarded as a toolfor future coefficients, butit may be future be extendedto trait investigation. It may in the if it proves analysis as well as anoverall dimension, feasible in laterstudy. For our presentstage Trait predictionby machines. the best of developmentof a newdiscipline, perhaps expert com- comparisons maystill be thoseof the human eight expert pared with themachine. As we have seen, qualified panelof 32 judges, randomlyselected from a and'evaluated itfor ideas, such judges,read every essay We were organization, style,mechanics, andcreativity. including the lastnamed, be- particularlyinterested in encountered tothis sort cause ofthe commonobjections of work by someteachers in thehumanities. have oftenmiscalcu- From thebeginning, humanists analysis, andimagined lated thedifficulties in essay punctuation and usagemight that specificcriticisms of but thatglobal be easy to programfor the computer, quality, style, orcreativity measuressuch as overall In one sense,quite the would bevirtually impossible. prompt successin actuari- reverseis true: We have had subjective traits, ally simulatingthe ratingsof these would suggest. Yet a as allof our datareported so far of usage of really sounddecision aboutthe correctness subject andverb, is a pro- a comma, or theagreement of -100- lem which presumes agreat amount ofanalysis, and some of the necessary backgroundroutines have not yet been pro- grammed for any projectanywhere. We shall discussthese problems later.

Surely, from thishumanistic point of view, themost challenging problem ofall would be to measurecreativity, since bysuch reasoning a creativework, or original work, is by definitionunlike the others, andis unique; and therefore it requires arecognition procedurewhich could not be programmedin advance. Obviously, the first stevis to remove theproblem from the humanisticviewpoint and put itwithin a view- point susceptible tobehavioral analysis. In order to do How do this, we must ask: How is creativityrecognized? we knowwhen we have achievedit? The only possible answer is seems to bethat a work is creativewhen people say it creative; there isno.evaluative procedureabove human judgment for decidingwhether something isimaginative, or original. But once again, we mustappeal to behavioralscience. will If we use one judge(and the humanist,when pressed, arbiter), then we have a often designatehimself as sole to what extent very uncertaincriterion. We do not know with the evaluations madeby this judgewill correspond Therefore, we must the "true" creativityin the work. of each ask other judges to assessthe work independently other and of thefirst one, and we mustregard their judg- equally ments, in absenceof evidence tothe contrary, as valid, if they areequally "authorities"in such matters We still (however qualificationsmight beestablished). correspond with the do not know how wellthese judgments can ascer- "true" creativity in thework, but at least we tain how well theycorrespond with eachother. -101- We must, in theend, assume thatthe population of represent our best all such expertratings would indeed this estimate of any such"true" creativity. Not to admit admit would leave us inhopeless solipsism. And when we do large such a criterion,and when suchratings are made of numbersof essays, eachof which may or maynot possess "creativity," we are ledto additionaldiscoveries about contrary to the trait. And thesediscoveries are quite the usual humanist wayof thinking. turns out to be For thedistribution of creativity And approximately normal,and approximatelycontinuous. of it has (as we notein Table V-2) astandard deviation the middle of the .641 rating points,which is just in appear to be apurely five traits. That is, it does not "qualitative" trait,which may at oncebe recognized for emphasize the its prescenceorabsence. Furthermore, to the reliability apparent continuousquality of the trait, the lowest for anyof of human judgmentof creativity was the five traits, aswill be seenpresently. to regard In short, then,there is every reason And "creativity" as acriterion ratinglike any other. and to regard inthe same wayoriginality, imagination, other near synonymsused or impliedby the instructions Of course, thedis- to the raters,shown in FigureV-1. tribution is in part aresult of theinstructions re- no apparent garding suchdistributions, but there was pattern. tendency by theteachers to forceit into a yes-no in The data forall five traits,then, are shown complete state- Table V-11, which mayrepresent the most of the basic proxes ment yet aboutthe comparative success lists the traitsby so farpresented. Column A of course of the pooled title. Column B shows usthe reliability Calculated for eachtrait. sum ofeight independentjudges,

-102- TABLE V-11

Computer Simulation of HumanJudgments For Five Essay Traits (30 predictors, 256 cases)

A. B. C. D. E. Essay Hum.-Gp. Mult. Shrunk. _QPrrk Traits Reliab. R Wilt. R (Atten.)

I. Ideas or Content .75 .72 .68 .78

II. Organization .75 ,62 .55 .64

III. Style .79 .73 .69 .77

.69 IV. Mechanics .85 .69 .64

V. Creativity .72 .71 .66 .78

NOTE: Col. B represents thereliability of the human judgmentsof each trait, based upon the sumof eight independentratings* August 1966. Col. C represents themultiple-regression coefficientsfound in predicting the pooledhuman ratings with 30independent proxes found in the essays by thecomputer program of PEG-IA. Col. D presents these samecoefficients* shrunken to elimi- nate capitalization on chancefrom the number ofpredictor vari- ables (cf. McNemar, 1962, P.184.) Col. E presents these coefficients,both shrunken and corrected for the unreliabilityof the human groups(cf. McNemar, 1962, p. 153.)

-1 0 3- The results here are not very surprising: mechanics sbows the best agreement, and creativity the least,and this is in accordance both with intuition and other work onratings That is, English teachers are readier to agree Ohwhether a word is misspelled, or animproper verb form used, than they are on whether a student's writingis original or shows imagination. We remember that spelling errors,in- adequate as our list of misspellings is,nevertheless correlated -.30 with mechanics, whilelength of essay be- came a major contributor tocreativity. It is clear, in any case, thathuman judges have a more difficult time with creativity than with other traits.

But what of the computer? Column C shows the raw multiple-R coefficients, predicting thesecriteria, unre- liable as they are, from the prox measurements. Here we see that mechanics enjoys noadvantage; to the contrary, it is more poorly evaluated by the computerthan creati- vity is, and organization more poorlyevaluated still. This relative standing, as we have seen,is contrary to the intuition of the humanist aboutwhat is easye and what is hard, in the computer evaluationof prose.

Column D has made the reductionin MULTR which, as we have formerlydiscussed, is necessary to compensate for the capitalization on random errorinevitable in multiple regression. These shrunken coefficients,then, have been found through statisticalmanipulation, or through lookup in Table IV-11(B),rather than through empirical cross-validation. Again, mechanics is not highest, and creativity not the lowest,of the shrunken correlations. Column E exhibits a transformationof Column D, pumping up the correlation to compensatefor the unre- liability of the criterion scores. Column E, therefore, reflects the true population correlationwhich might be -104- expected from the 30 proxes underthe case of perfectly reliable judge ratings, and aftereliminating the capitali- zation on random variation inthe proxes. Thus the corre- lations in Column E are the bestevidence to date about what success we theoreticallywould have in predictingthe important qualitative dimensionsof ideas, organization, style, mechanics, and creativity,using only computer- measured variables in the prose.

For all five traits, wehave seen an ability to pre- dict the "true" ratings with arather surprising degree of accuracy.

Summary. This chapter has brokendown the evaluation of essays into importantdimensions, and has investigated strategies in predicting humanjudgments of these dimen- sions. New ratings were generatedfor 256 essays, with eight expert teachers,drawn from a sample of 32,inde- pendently grading each essay, onfive traits commonly accepted as important. The judge"intercorrelations,and trait differences, were shown. Then the chapter indicated how the proxesdifferentially contributed tothe traits, so thatspelling errors contributed tothe evaluation of mechanics far more than theydid to that of the other traits. Nevertheless, as one would expectfrom the halo effect demonstrated hereby correlation and byanalysis of variance, there was agreat similarity inthe lists of high contributors to thevarious traits. Some investiga- tion was made of refiningthe criterion bygathering to- gether similar judgeviewpoints, and thispossibility was recommended for furtherexploration. Finally, the overall ability of the system topredict the various traits was tested, and it was foundthat, contrary to what somemight argue, suchpresumably lofty andsubjective traits as creativity could be aseffectively evaluated,using the present strategies, aswell as the presumably moreobjec- tive trait of mechanics. All in all, there didnot appear -105- to be any general area of essay evaluationwhich seemed, on any a priori grounds,beyond the possibility of automatic evaluation and analysis. CHAPTER VI

PROBLEMS OF STATISTICAL IMPROVEMENT IN PREDICTION

The prior chapters havereported on work done over more than two yearsin analyzing essays mechanically. As has been seen, the success todate has been striking, al- though in a number of ways thereported strategies are surely less than optimum. One of the possibilitiesfor improvement would appear to bein a more sophisticated strategy of statisticalprediction. This chapter tells of explorations made into seeking somesystem other than the basic linear one of mostmultiple regression programs. Much of this chapter is based upon areport made by the authors (Paulus and Page, 1967)at an Annual Meeting of the American PsychologicalAssociation, in Washington, D.C.

The problem of linearity. A standard multiple regres- sion program calculates anequation of the type shown as equation (1) in Figure VI-1. In this equation bl tob30 represent computer calculatedweights for each of the proxes in such a way so xl to x30. These weights are calculated^ as to maximizethe correlation betweent (the predicted score) and Y (the actual score orrating). We found this correlation to be over .65 (thatis, on cross validation and after correction forattenuation). As we have seen from prior chapters, themethod works. However, it does not work as well as it could, orperhaps should.

Of the many ways onemight attempt to improvestatis- tically upon the method, thischapter will report two, since both are applicable to awide variety ofmultivariate predictive problems, not only tothe grading of essays.

-107- FIGURE VI-1

(1) AY x3+ bx30 + c = b1x1+ b2x2+ b3 30

x2+ b)x (2)Y = (b2 1 1+ c

(3) Y = b1x1 + b2x1 x2 + c

30 30 30 (4)Y = b.x.+.E, 2 E + c 3=.L. 33 3=.L. i=3+1 133

A ) (x 1-X-1) (x 2-'72) + c (5) Y = b1(x1-1 + b2(x 24E2) + b3

x (6) Y=bx1 1 1 1+bx2 2 2 2+bx3 12 3 1 2 b3x2X*1 + b3k-1i2+c

(7) Y=bx1 1 +bx2 2 +bxx3 1 2 3 1 2 3 2 1+c1

Si -bX-)x2 + bxx2 +c1 (8) Y = (b1-b32)x1 + (b2 31 31

-108- Since both employ only existing data,the problems asso- ciated with the collection of furtherdata are, therefore, avoided. It is our belief that some ofthe same problems will often haunt the workers with verbaldata of the kind in this project, and therefore thisdiscussion has rele- vance for otherworkers concerned with naturallanguage strategies. The first of the two approachesdeals with the use of simple two-way interaction termsin an attempt to increase Predictability. The second approach dealswith the examin- ation of the relationships betweenthe various proxes and the criterion (the pooledratings), with the thought of applying transformations to the proxesin an effort to in- crease thecorrelations between the proxos andthe criterion. One of these usesinteractions, then, and the second uses transformations.

Interactions. One way in which onemight conceptualize an interaction termin a multiple Tegressionequation is to think of variable weightings ofpredictor variables. We want the weights receivedby a given variable not tobe a function of that variable'scorrelation with the criterion and the other independentvariables alone, but also tobe a function ofthe subject's score on someother variable. Equation (2) of Figure VI-1 willmake this clear. Note that the weight received by xlin this simple equationis the quantity (b2x2 +b1); some function of thevariable x2 plus the constant b1. Carrying out the indicatedmultipli- cation we obtain in equation(3) the simple cross-product of x and xwhich, along with theappropriate weight, 1 2 represents the interaction of xland x2 on the criterion. Generalizing from this simple case, we cansee that any number of cross-products (i.e.,interactions) may be in- cluded in a multiple regressionequation along with linear look at 435 two- terms. Given our 30 proxes, then, we can -109- way interaction termsin addition to the 30linear terms. This equation is of the form ofequation (4) of Figure VI-1.

Before this equation can becalculated, however, two preliminary problems must beconsidered. The first problem is illustrated by equations(5) through (8) of thatfigure. Assume that we want topredict some criterion score Yfrom x x and the interaction three independent Variables: 2' of x and x xx Equation number (5) gives thethree 1 2' 12' predictor equation withindependent variables expressedin deviation form. In equation number(6) the indicated multi- plications have been carried out. Underlined terms are all constants and may, therefore,be absorbed in the constant term "C", as shown inequation number (7). After underlined terms have been combinedin equation number (8) wefind that the weight assigned to xlis the quantity (b1 -b3-R2) and the weight assigned to x2is (b2 - b3R1). We can see, therefore, that the weightsassigned to the linear termsin equation number (8) aredistorted by the values ofb3, and or R There are only two conditionsunder which this 2' distortion is not present. First, when b3 equals zero,and second, when the means ofthe linear terms areequal to zero. Since the interaction termis included precisely because the investigatordoes not believe that b3is equal to zero, only onealternative remains: to set RI andR2 equal to zero. Further, and this appears tobe contrary to all intuition, thecorrelation between theinteraction terms is affected in a mannersimilar to the distortionof the weights assigned to thelinear terms, unless the meansof the linear terms are equal to zero. These distortions generally tend to inflate thecorrelations between thein- teraction terms and the criterion. Interaction terms, therefore, generally appear tobe more valid thanthey really would be if the meansof the linear termshad been adjusted to zero. If higher orderinteractions are con-

-110- sidered, then the means of the lower order interactionterms must be adjusted to zero before the higher orderinteraction terms are calculated. Interestingly, the multiple correla- tion coefficient is not affected by these distortions;the crucial effect these distortions have is in the research- er's interpretation of his results. Hence, as the first steo in working with interaction terms in thepredictive context outline above, all linear variables mustbe stan- dardized to a mean of zero. Another way of phrasing this dictum might well be: "No multiplication without standard- ization". The second general problem which must be resolved, orior to the calculation of eauation (4) mentionedbefore, is the selection of useful interaction terms. "Useful" is used here in the sense of an interaction term'sability to increase the multiple correlation. As mentioned before, we have, given 30 proxes,435 possible interaction terms which could be included in the equation. It seems clear that not all of these interaction terms canbe efficiently used in our predictive context. The reason for this is that, in cases where the predictors vastlyoutnumber the number of subjects, the loss in validity oncross-valida- tion of the linear composite of terms becomes very, very great. A method of selecting usefulpredictors from all of the possible predictors needed, therefore,to be developed. A standard method usually employedin situations of this type is step-wise multiple regression. However, all step- wise multiple regression computer programswhich we were able to find and to examine, requiredthat at least one variable-by-variable matrix be stored in core memoryof the computer. Given the amount of data availablehere, this would require a minimum of 200,000 corelocations, too many for the computers currentlyavailable. As an alterna- tive to the step-wise multiple regressionprocedure, the following method was employed. A simplecorrelation coefficientis calculated between The each of theindependent variablesand the criterion. absolute values ofthese correlations arerank-ordered. the criterion is The largest correlationis selected and predicted from thatvariable which yieldedthe correlation. This is done for eachsubject. A new variableis then created which is thedifference between thepredicted This new criterion score andthe observedcriterion score. variable has the propertyof being uncorrelatedwith the independent variablewhich wa3 just used. In other words, with the we nowhave a variablewhich does not correlate portions of the variable that wasselected, nor with those with the vari- other independentvariables which correlate able that wasselected. In effect, aseries of partial correlations are calculated: the first onebeing a zero correlation, order correlation,the next one afirst order criterion is replacedby etc. At each step,the original is repeated the residual, the newvariable, and the process with any of the until the residualcorrelates no longer reasonable level, remaining independentvariables at some sav05. Thismethod provides for arank-ordering of First, the predictors. But the methodhas two weaknesses. (Ease in pro- method is not aspowerful as its converse. desirable gramming, however,made the presentmethod more does not reallyallow at this time.) Second, the method This was un- for the selectionof suppressorvariables. fortunate, and theinvestigators stillseek a solution. selec- An additionalproblem inherentin all predictor The validity tion techniques isthat of crossvalidation. will, of course,almost of a multipleregression equation equation was always be highestfor the samplein which the in the population. constructed, and lowerin other samples or the cross- As we havepointed out, formulasfor estimating equations, such as validities of samplemultiple regression formula, do not the Wherry formula orthe Lord-Nicholson -112- apply with complete rigor tosituations where predictors were selectively chosenfrom a large number ofpredictors. Therefore, cross-validation estimateshad best be esta- blished empirically. Thus, in this research, thepreviously described procedures were applied to arandom sample of two thirds of our essays; the remainingthird was used as a cross-validation sample.

Table VI-1 presents the dataobtained when this cross- validation was applied. Note that only nineinteractions and linear terms wereselected before the correlationbe- tween the residualcriterion and any of thepredictors failed to exceed .05. These nine variables wereentered into a standard multipleregression equation in order to obtain weightings and a measureof their combinedpredic- tive power. The obtained results arereported in Table VI-2. As expected, themultiple correlation issomewhat higher than the one obtainedwhen the 30 linear terms were used. This increase, however, can'tbe evaluated until the equation is cross-validatedand the amount ofshrinkage has been discovered. Therefore, the obtainedequation was applied to the remaining thirdof the sample, and the pre- dicted scores were correlatedwith the observed scores. The correlation coefficientwhich was obtained was.63. You will note that thiscoefficient is approximatelythe same as theshrunken coefficient which wasobtained by using the 30 linear terms -the proxes alone. This seems to indicate that we canpredict the grade an essayreceives as well fromnine variables as wecould from the original therefore, does not 30. Making use of interaction terms, allow us to predict any better,but rather to predictjust in as well using farfewer variables. The lack of increase predictability is puzzling and mayperhaps be attributedto the relative unstabilityof the criterion. If the criterion had been more reliable,this method wouldsurely have yielded better results, for the reasonsalready explained in an earlier chapter.

-113- TitBUt VI-1

HANK-ORDERING OF PREDICTORVARIAbLES

Correlation with Residual Criterion Step Variable Word Length .52 1 Standard Deviation of .23 2 Number of Commas

Words .05 3 Length of Essay in Periods and 4 Interaction of # of # of Subject-VerbOpenings -.15

Interaction of # ofPeriods and 5 # of Declar. SentencesType "A" .12

-.11 6 # of Dashes -.10 7 # of Words on Dale List

Interaction of # of Periodsand 8 .07 if of Declar.Sentences Type "B"

.05 9 # of Connective Words MULTIPLE REGRESSION EQUAT:UN FJR LINEAR TERvIS AND INTERACTIONS TABLE VI-2 Linear Terms Variables B-Weight Stan. Error LengthNumberS.D. of of Word EssayCommas Length in Wds. .0050.0278.0023 .0006.0096.0014 2.89583.83333.5714 p .,:.014..01.01 Interaction Terms No.Number of Wds.Connectv.of Dashes on Dale Wds. Lst. -.0191 .0050.0223 .0092.0014.0097 2.07613.57142.2990 p pc.05 4..05:.01 1--.. 1 un1-- 1 # of Periods X Subject 1.7500 # of Periods X # of Decla. Verb OpeningsSents. Type Sents.Itall Type ubn -.0014 .0203.0011 .0099.0005.0006 2.20002.0505 p ..05<.05<.10 MUltipleStan. Error Correlation of Y= .471 = .74 Transformations. The first step indealing with the relationships between each ofthe proxes and thecriterion graph was todevelop a short computerroutine which would the relationship betweeneach of the praxesand the cri- terion. Sample graphs for fivevariables are shown in by examining Figures VI-2 toV1-6. It was hoped that such graphs insightscould be obtainedwhich would aid one in selectingtransformations to beapplied to the proxes so as toyield higher correlationswith the criterion.

On these graphs,the x or horizontalaxis represents axis the independentvariable, the prox. The y or vertical A rating represents theratings which an essayreceived. receive; a rating of 1 is the lowestrating an essay could of 5 was thehighest. Each of the graphs wascareft.11y examined in aneffort to determine if anyreasonable curve mightexplain the data better than astraight line. It appeared thatfor several of the graphsthis might well bethe case. Examine, VI-2). for example, thegraph for variablenumber 8 (Figure The curve indicatedby the dotted line maywell fit the indi- data better than thestraight line. Both have been cated on that graph. In order tosequentially apply trans- formations to the proxes,the followingtechniques were written for the IBM employed. A FORTRAN II program was equipment 1620 computer(chosen locally forits auxiliary researcher to applyreal- and accessability)which allows a calculates time transformations tothe data. The program and criterion, means andstandard deviationsof both prox Next, the and the correlationcoefficient betweenthe two. via a IBM relationship between the twovariables is plotted in 1627 plotter. These plots aresimilar to the ones Figures VI-2 to VI-6,except that thepoints are connected After exam- and that a completeplotting grid issupplied. to the data oneof ining the plot, theresearcher can apply -116- fly*, e, eri; NWV."C. . VARI ABLE NUMBER . 8 SCALE NUMBER 1 3.. 4 * ' 4- 4. . 3.13.23.33.0 * * 11, ..?..42.92.52.72.8 * 4+* f+ * .1. " 2.12.22.3 * 1.81.51.61.7 * * 9 * Nudber of Commas FIGURE VI-2 r .34 Note: indicated.casesintoFor this50 (usually equal and subsequentintervals. 10) graphs, the horizontal axis has been divided liave been circled and the number of cases has been Points representing a substantial number of _-_ VARIABLE.NUM.BER _ - -_ - SCALE NUMBER. 1; - ir.r .41, NI sre te,ecrli7.7g7rf rTlere. ,« IR.? 440 3.43.23.3 * * * ++ e 3.1 ** 4. . 2.92.62.72.8 *0**3)15* + + + + . 2.32.4.2.5 * - 2.12.2 * H- 1 .2.0 1.81.91.7 ** 4. ******************************************************. Number of Exclamation Marks....._ FIGURE VI:.3

=i1 .r K1404.tNO.I10.- 444+.' 7 ; FrAlzcz,vrrrAv 111 -VARIABLE NUMBER 22 _ 3.5 iv*3.4 * + 4. + + , 3.2.*3.03..3.3. 1 * * _ It 13 (÷) 3.1 ___.2.4 * ....2.6 * 2.72.82.9.*2.52.3 * * * - IS $3. 1 11 D. I I-.to 2.22.02.11.9.1.8 * * 1.71.51.6 * *.* *******************************************************Number of Common Words on Dale..List FIGURE. VI-4

n ,VArnit I r , VAR I ABLE. NUMBER 2 3 SCALE NUMBER. 1 ' 3.5..3.4 * + ' 14 4- 3.23.3'3.1 * * " 2.93.02.72.8 * 11. ' 35 + -.2..6_2.4 *.2.5 * *2.22.32.1. * * -4 o 1.91.81.6.* 1.5* .* * 16 1, *******************************************************Number or Sentences with End. ilinct..Present.....7... FIGURE VI-5

P' VARIABLE NUMBER K4.44Qq,,f4.1:,;4 29 SCALE NUMBER 4.... 4 , _ _ - , 3..5 * , - - - - 4- 3.13.23.33.4 * ' + 1° + + + 2.93.0 ** 13) tu (341 &0 11 t--- 2.52.62.72.8 ** + + " +.15 0 2.22.32.42.0 **+* 4. 4. + 10 ® N') I. Z:r.1.91.51.6 * **********************************************,,prnmStandard Deviation of Word Length------FIGURE VI-6 r: 52 14 transformations (or any combination of these transforma- tions). The entire process is then repeated until the re- searcher decides to stop. The current values of the variables are then punched out on cards. This processis illustrated by Figure VI-7.

As the number of cases increases, this process becomes painfully slow on the 1620. (Compilation alone took about 20 minutes, and for 200 cases some transformations required as much as 15 minutes.) As a result, Paulus converted this program to run on our time-sharing teletype console, which is connected with an IBM 7094 at the Massachusetts Insti- tute of Technology. Again, it appeared that the relative instability of the criterion limited the usefulness of this approach.

Discussion. To' date, then, the Project has examined some methodological problems dealing with nonlinearityin predicting grades on essays. So far, however, we have not been able to substantially increase predictability by these methods, beyond that obtained under a naive linear assumption. It is our feeling that this may be due to the instability of the ratings of the essays, and, of course, the lack of more sophisticated proxes. Given the loose ratings used so far, it seems relatively unimportant what combinations of proxes are used, what transformations are applied to some of them, or what interactions are considered. The multiple correlation, after cross validation, appears to have stabilized at about .65.

There are at least two general ways in whiCh such work may proceed in the future. The first is to recognize that there are differences among raters, and to attempt to empirically establish groups of raters, then to attempt to describe the characteristics of these groups. Some steps in this direction have been reported in the prior chapter. Then multiple regression equations, employing the previously

-122- FIGURE VI-7

SAMPLE OUTPUT FROMTRANSFORMATION PROGRAM

PR,OBLEI4 NUMEI 076 CURVE FITTINGPROGRAM

THERE ARE 50OBSERVATIONS AND PRESS STARTWHEN READY PLEASE CHECK SENSESWITCH SETTINGS

MEANS AND STANDARDDEVIATIONS

MAN OF X = 1 .9800

MEAN OF 1 = 2.8890 S.D. OF X at .9637 S.D. OF .5703 X AND I IS .0720 THE CORREIATIONCOEFFICIENT BEIWEEN

I F STOP 'AITE9, ELSE 5

NUMBER AND PARAMETER I AM READY TOACCEPT ROUTINE 12 2

MEANS AND STANDARDDEVIATIONS

MEAN OF X at 6.4369

MEAN OF I = 2.8890

S .D . OF X sa 2.0449 S.D. OF 1 5703 X AND I IS .1621 THE CORRELATIONCOEFFICIENT BETWEEN

IF STOP WRITE9, EISE 5

PARAMETER I AM READY TOACCEPT ROUTINENUMBER AND

1,2 2

MEAN OF X ss 52.6452

'MEAN OF I as 2.8890

S .D. OF X in 10.8134. S.D. OF I in .5703 1 2 3 FIGURE VI .7 (Cont.)

THE CORRELATION COEFFICIENT BET. X AND Y IS .1423

IF STOP WRITE 9, ELSE 5

I AM READY TO ACCEPT ROUTINE NUMBER ANDPAR/kW:Me

6

MEANS AND STANDARD DEVIATIONS

MEAN OF X =c 6.4369

MEAN OF I =- 2.8890

S.D. OF X = 2.0449

S.D. OF I- .5703

THE CORRELATION COEFFICIENT BETWEENX AND T IS .1621

IF STOP WRITE 9, ELSE 5

2 END OF JOB

Note: All input is underlined. discussed techniques, can be calculated foreach homogene- ous group of raters. We have recently completed afactor analysis of the 32 raters who rated 9ur essays. However, since not all essays were rated bythe same raters, we find that our data matrix contains moremissing data than existing data. So we suspect that at least someof the factors which we empiricallyisolated are missing data and/or content factors.

A second general approachdeals with differentially weighting raters before composite scores arecalculated. This approach requires somejudgment about the relative validity of each rater. Since we have no essays which have been rated by all of the raters,this poses some problems. One approach seems promising. This involves factor analy- sing the raters and using theirfactor scores (or some function of them) on the firstprincipal component as weights.

Summary. In general, the investigatorsfeel that workers with verbal data shouldbe pleased but not contented with the present state of theart, and with the results obtained from using linearregression analyses. And they should continue linear analysisfor the time being. But they should take care, wheneverin doubt, to cross-validate the results. Further statistical optimizationwill probably be eventually profitable,whenlarger changes have beenmade in other aspects of the work.

-125- CHAPTER VII

PHRASE LOOKUP AND ITS APPLICATIONS

The work described in the chapters up to this point has been limited by the computer program which has been used, and which has been shown in Appendix A.While this program, called PEG, is modular, mnemonic and flexible, it lacks any real convenience in looking up phrases. The present chapter describes a phrase look-up algorithm to accompany the program for essay analysis, and describes some studies done with the algorithm.

The phrase look-up procedure. The phrase look-up algorithm for this project was designed primarily by Donald R. Marcotte, and formed part of his M.A. thesis (Marcotte, 1966). Much of the present description is from his thesis or from the related report by Marcotte, Page, and Daigon (1967).

In one sense, of course, phrase lookup requires no special program. It is easy to insert in a FORTRAN program a conditional transfer of the form: IF (WORD(I).EQ. X.AND.WORD(I+1).EQ.Y) GO TO ... Here we have tested whether two words in a sequence of text words matched two words from some phrase. If the first text word in the sequence is not the same as X, then the test has failed, and the GO TO will not be executed. And if the first word is X, but the second text word is not the same as Y, then again the test has failed, and the GO TOwill not be executed.

Such a test, however, lacks efficiency, and as a list of phrases of interest becomes large, would become very ,cumbersome to program, organize, and alter. What is de-

-126- sired is a procedure which permits search through a simply presented list of phrases, a list which may be regarded in the same way as the dictionaries in the main analysis pro- gram. And it is this need which the subroutine PHRASE was designed to fill. Appendix C has the source program list- ing for PHRASE.

In order to implement PHRASE, a skeleton copy of the PEG program was used to assemble the sentences of the essay being read, in the way already described. Also, the main.program was used to read the array of first words of the phrases, and to read in the full phrase matrix.

One sentence is obtained from the essay being cor- rected. A phrase-within-quotation-marks (PWQM) counter, a PWQM indicator, and an adjusted word counter are set to zero. The PWQM counter is incremented every time a phrase is enclosed within quotation marks. The PWQM indicator provides a symbol, either 0 or 1, for punched-card output. The adjusted word counter eliminates unnecessary processing of words that have already been identified as part of a phra:;e. Since phrases of only two or more words are pro- cessed, the index indicating the number of words in the sentence is reduced by two, because the last word and end punctuation need not be processed.

DO LOOPS are set up which caIl the computer to cycle automatically until certain criteria have been met.The initial DO LOOP provides for the search of a sentence for a word that belongs to an array of first words of phrases. Prior to doing this, a test is conducted to determine if the index indicating the ordinal position of the word in the sentence is less than the value of the adjust word counter. If this index is less than the adjusted word counter no cycling occurs since the word being analyzed' has already been processed, or it is the first word of the sentence. If the index is equal to or greater than the

-127- adjusted word counter, theword is processed.A provision is made to eliminate theprocessing of both parts of a natural-language word. This is necessary since a computer word (on the IBM 7040)consists of only six letterswhile many words inthe English languagecontain more. There- fore, all natural-languagewords are represented bytwo computer words. This means that it ispossible to identi- fy the first part of anEnglish word and also attemptto identify the second partof the same word. This possi- bility is eliminated by anappropriate test. The test is made by dividing theindex for indicating theordinal posi- tion of the natural-languageword in the sentence. If the natural-language word has beenprocessed previously, then no cycling occurs,and the next computerword is examined. Because the computercannot differentiatebetween natural-language words andpunctuation marks, a testis conducted to determine whetherthe unit being analyzedis a punctuationmark. If this is so, nocycling occurs; but if the unit is not apunctuation mark, cyclingdoes occur. As was notedearlier, each natural-languageword needs two computer words. Therefore the second DOLOOP requires two comparisons foreach word provided toit. These two comparisons result in theidentification of theparticular phrase for whichprocessing occurs.

After identifying thespecific phrase, theadjusted word counter isincremented by two becausetwo computer words have been processed. The index for theprint-out array, RC,is set equal to two,and the two identified computer words areplaced in the array,RC. The value of counter the row*counter replacesthe value of another row needed to process thephrase. -128-

; comparison of each The third DO LOOPprovides for the identified word with natural-language wordfollowing the specific phrase. During each natural-languageword in the computer word each cycle a testis made todetermine if the word of the of the sentenceis the same asthe computer index is incremented phrase. If it is, thenthe RC array placed in the array,RC. by one and thecomputer word is test is made todetermine If no comparisonis made, then a of the phraseis present. if the symbolidentifying the end identification of thepresence If 30, thenindices for the This is done in two of quotationmarks areestablished. with the ordinal steps: (1) by replacingthe first index preceding the first value of thenatural-language word by replacing word of the phrasein the sentenceand (2) position of thenatural- the second indexwith the ordinal word of thephrase in language wordsucceeding the last index may have oneof two values. the sentence. The second phrases that arenot This permits theidentification of marks but alsohave punctua- only enclosedwithin auotation The first ofthe tion marks withinthe quotationmarks. and if thephrase is not above alternativesis examined, marks then thesecond enclosed solelywithin quotation alternative iscorrect, alternative isemployed. If neither incremented by one,and the then the phrasecounter is natural-language wordin computer wordfollowing the last the phrase inthe sentenceis cycled. comparison is If in the testfor an endingsymbol, no RC, is testedto deter- made, then theindex for the array, words are inthe mine if lessthan fournatural-language tested becausephrases con- array. The fourthword is not If there are sisting of fourwords have noend symbol. then theoriginal fewer than fourwords in the RC array, the next phraseis row indexcounter isincremented and because severalphrases beginwith processed. This is done

-129- the same word, and it is necessaryto examine allphrases having initial words in common. Continual incrementing of the row index counter occursuntil all phrasesbeginning with the identified wordin the sentence havebeen analyzed. This also means that oneextra word will beanalyzed, the initial word of the phrasefollowing the phrasesthat have been examined, becausethe number of phrasesbeginning with the same first word are notconstant. That is, it is not possible to determine whenthe series of phrasesbeginning with the same first wordend. Therefore the addedcompari- son is made. Once the phrases areidentified, it is necessaryto record the informationfor "output." The output is punched on cards aswell as printed on paper. There are two sets of punched card output: (1) the cards containingthe identification number of the essay,the identification number of the phrase,the symbol indicatingwhether the phrase is enclosedwithin quotation marks ornot, and the identified phrase, and(2) the cardscontaining the identi- fication number of the essay,the total numberof trite expressions used in the essay,and the total numberof The trite expressionsenclosed within quotationmarks. printed output is anamalgamation of (1) and(2) above.

The final DO LOOPprovides for thereplacement of :,ach word in the RC arrayby zero. the An application tocliches. Beyond constructing described algorithm; themain purpose ofMarcotte's study computer was to findhow importantcliches may be in the evaluation of student essays. Surely, according to English texts, such patternsof writing wouldbe presumed expected to handicap anessay's evaluation,and might be to correlatenegatively with humanjudgments. -130- ---..,,,V07MEMErammtmmx,

Background on cliches.A cliché has beendefined by Partridge (1962) as ...an outworncommonplace; a phrase, or short sentence,that has become sohackneyed that care- ful speakers and scrupulouswriters shrink fromit because they feel that its useis an insult to theintelligence of their audience or public." The searching of essaysfor cliches is a tedious if notimpractical task. Certain clichés such as "each andevery" and "null andvoid" seem to blend into a sentence sothat they are noteasily seen often on thefirst reading. Second and third readings are necessary toidentify the cliche orcliches in the essay. The task, therefore,of spotting clichés seemsinsurmount- able when there areseveral hundred essays tobe examined, particularly when the essayhas to be graded forother factors such ascreativity, mechanics, style,organization, and ideas or content. The time reauired tomake just one very detailedreading and commentary, aminimum of fifteen minutes (Daigon, 1966),is considerable, butwhen two or three readings arerequired the timemultiplies greatly. Because cliches areclearly defined word groups,a computer search strategy is veryefficient. Cliches can be stored in the computer and exactcomparisons made. LaBrant (1949) hasdiscussed the difficultyof being it, sure when acliche is hackneyed tothe person using and Fowler (1965) haspointed out that everycliche seems fresh and novel at sometime to the user. And Guth. (1964) has warned againstthe "overzealousavoidance" of phrases which might seem trite,saying the there is a"not always hackneyed and the clearly distinctborderline between the idiomatic" (p. 194). the problem Partrid"ge (1962),however, has approached list moresystematically by providing arather extensive each cliché of cliches indictionary form. He categorizes into one of four groups:

-131- 1. Idioms that have becomeclichés.

2. Other hackneyed phrases. Groups (1) and (2)form at leastfour-fifths of the aggregate.

3. Stock phrases and familiarquotations from foreign languages. 4. Quotations from Englishliterature.

Other noteworthy aspectsof Partridge'sdictionary are definitive informationand specific examplesfor each group of cliches, and theannotation-of some cliches toindicate that these are consideredparticularly hackneyed orobjec- tionable. Furthermore, Warriner(1951) and Griffith (1957), Partridge's lists, pp. 263-4)have supplied cliches not on and still others havebeen supplied by personaladvance of included in Dr. Arthur Daigon. Three hundred clichés were this portion of thestudy, divided into two groupsof 150 "particu- each: (1) cliches consideredby Partridge to be larly offensive," and(2) others which werepresumably not so odious. These lists may be foundin Marcotte (1966, App. C), and will notbe presented here.

Of the 256 essaysexamined, only 58contained any 74 occurrences ofthe cliché phrases,and there were only occurrences alltogether. The number ofdifferent cliches used is only 24, andthese are listed,together with their frequency of occurrence,in Table VII-1. An examination of these shows a ratherlarge loading on twophrases: "finer things" and "in myopinion". When it is remembered that this particular essay wason whether,in a student's opinion, the best thingsin life are reallyfree, it is understandable why theseshould occur sooften. And these two phrases are seento be prettymeaningless for any general conclusions. Of Partridge's"particularly offen- frequency sivevphra3es only eight werefound, for a total do not of only 13g In general, thecliche's actually found seem necessarily veryhandicapping.

-132 - TABLE VII -1

TRITE ?Ifni:SFSFOUND IN HIGH SCHOOL ESSAYS

Frequency Clichg all in all 3 2 by the same token 1 commonunderstanding 1 each and every 14 finer things 1 first and foremost helping hand 3 1 high and dry 19 in my opinion in the long run 7 1 let's face it 1 matter of fact more orless. 3 1 really and truly 1 reigns supreme 2 root of allevil 1 step by step 2 survival of thefittest this day and age. ********** 5 1 through and through 1 through thick andthin OOOOO 1 to say the least 3. wishful thinking 1 work and no play

-133- This intuitive feeling isborne out by Marcotte's statistical comparisons betweenthose essays containing clichés and those not containingthem. The mean ratings of the two groups(one with 58 essays, the otherwith 198 essays) were comparedusing a random-sample t-test, one- tailed because of thenatural assumption that the non- cliché essays would bepresumedly superior. No such evi- dence was found. To the contrary, for onetrait (Ideas) the difference between groupratings was even in the wrong direction, and happened to accountfor the largest t-ratio found (1.37). .And none ofthe t-tests approachedsignifi- cance. We may infer that theseparticular lists ofclichés, which are apparently asauthoritative as any, do notaid in predicting whether an essaywill be judged to have superior ideas, organization,style, mechanics, orcreativity. Often findings of "nosignificant differences" aredepre- ciated as inconclusive, oruninteresting to science. Here, however, where the data aredrawn from a naturalistic essay situationand evaluated byrealistic judges, such null-hypothesis findings seemto have greatrelevance. The avoidance of hackneyed phrasesis often a subject ofteach- ing in courses incomposition, and this studycasts a con- siderable shadow over theimportance of the topic,at least in the secondary gradeshere sampled.

A search forpsychological characteristics.Another application of the phraselook-up algorithm wasin a study of what might be calledquasi-psychologicalcharacteristics of prose (Hiller, Page,and Marcotte,1967). This study in was acombination of the strategiesand methods used this overall project,together with someof the subjective list-generation character ofthe General Inquirer(see Sterne et al, 1966).

-134- "opiniona- Traits werepostulated whichHiller called and tion," "vagueness,"and"specificity-distinctions", phrase diction- for which hesubjectively generated some subroutine al- aries, for intended usewith the PHRASE phrases ready described. For the traitof opinionation, "in my opinion," were listedsuch as "Ifeel," "I think," list indluded such appar- "who can doubt,"etc., and the "beyond a ent indicatorsof certainty as"all," "always," and such certitudewere doubt," etc.,since opinionation All' told, 130 items believed to havesomething in common. wereincluded. from an The other traits weresimilarly generated in Strunk intuitive basis,supported bygeneral admonitions believed by Hillerto and White(1965). "Vagueness" was "probably," "usually," be indicatedby such. qualifiersas etc. This category "a matter ofopinion," "generally,: And"specificity-distinc- of vaguenesscontained 60 items. indicated by wordsimplying a tions" wasbelieved to be such as "analyze" specific, or concrete,point of view, "specifically," "ambiguous,""exception,""distinction," 90 words orphrases. etc. This list contained looked up in the256 These phraselists, then, were studied with the same essays, andtheir correlationswere the general five traits of essayquality. To eliminate of such factor of length,the frequencyof occurrences divided by thetotal numberof phrases shouldproperly be with other proxes. words of an essay,just as was done with the fivetrins The correlationsof these new proxes are inthe are shownin Table VII-2. All correlations of them arehighly signi- predicted direction,and a number represented. At ficant, given thelarge numberof essays findings of TableVII-2 seem to first glance,then, the validity of the lend some supportfor a kindof construct three traitspostulated. -135- TABLE VII -2

CCRRELATION CF FIVE MAJOh TRINS WITH "OPINIONATION, " "VAGUENESS," AND"SPECIFICITY" (N = 256)

Trins nopinioneR wii,JaaW "Specif."

1. Ideas -.17* -.26* .08

2. Organization -.20* -.15* .15*

3. Style -.16* -.22* .10 -.14 4. Mechanics -.14 .10

5. Creativity -.14 -.32* .04

Means 9.1 15.0 2.0

St. Deviat. 7.7 6.3 2.0

*Significant at the .CI level, with aone-tailed test.

Note: All correlations are based upon proxproportions, but the means and s.d.,s are rawfrequencies. t

Unfortunately, furtheranalysis leaves the question very much in doubt,and the sum of evidence seemssomewhat more negativethan positive. It can be rememberedfrom earlier chapters of thisreport that some of thelargest correlates obtained with mostof the trins were those proxes based onvocabulary: Dale common word list, aver- age word length,and standard deviationof word length. These three are allpresumably correlated with aninferred "frequency-score" of a student: the more words he uses which are infrequent, the morefavorable will be these various vocabulary measures,and the higher will behis probable ratings. These traits of"opinionation," "vagueness,"and "specificity" were generatedby uncontrolledsubjective procedures, which wouldof course have nobuilt-in safe- guards againstcorrelations with these otherimportant bias along proxes. Even the examplesgiven here suggest a the frequency dimensions: "I," "my," "always,""all," "probably," "usually," strike one asfairly commonplace, whereas "analyze,""ambiguous," "distinction,"etc. are drawn from a less frequentset of terms. This supposition is borne out by theevidence inTable VII-3.

Here it is evidentthat the presumeddimensions are well enough correlatedwith prior vocabulary proxesso that the new evidenceof correlation with essayquality for con- does not contributesubstantially in any search to struct validity forthe new lists. If the lists happen strike a reader aspersuasive, then the measures,indivi- dual though they are, canbe said to possess someface validity. But apparently westill do not have any more compelling evidence fortheir being importantmeasures in their own right. This is a problemthat is common in con- tent analysis work. The problem isshared by the "diction- aries" used in most of theGeneral Inquirerwork, as we

-137- TABLE VII -3

CORREL;-,TIONS OF VOCABULARY i.liAarai.s 'AITH "CPINIONATION," "VAGULNL66," Aid')"SEICIFICITY" (N = 256)

Vocabulary n0Dinion." nVague.11 Ilapecift"

Dale list .32 .16 -.14

Aver. Wd. Long. -.43 -.06 .24

St. Dev. Wd. Long. -.18 -.19 .19

Only proportions are used for thecolumn proxes. have already noted,and the GeneralInquirer was clearly one of the twomodels for thissub-study. viewpoint, is More important,from an essay-analysis essay quality the fact that themultiple-R for predicting "opinion- does not seem tobe increased bythese traits of At first they ation," "vagueness,"and "specificity." variance, but seemed to one workerto contribute some new significant improve- to date nocross-validation has shown hypothesized ment in theprediction throughuse of these variables. This has some meaningfor the major future developments in essayanalysis, as will bediscussed in a later chapter. of routines Correlative conjunctions. Some other types words which may be have been developedfor sequences of separated by otherwards. One worker,Alice Trailor, was conjunctions, such as curious about the useof correlative nor, etc. Her reasoning either . . . or, neither parentheti- was thatsentences utilizingphrasal, clausal, indicative of a more cal, or transitionalelements would be provide mature orsophisticated style. And devices which subordination, such ascorrela- means forcooidination or to predict human essay tive conjunctions,might be expected evaluations. lexicon To test thisrelationship MissTrailor used a taking Pence (1947) of 11 commoncorrelative conjunctions, resulting fre- as aguide. For the 256"free" essays, the shown in quencies of suchcorrelative conjunctions are the usage Table VII-4. Obviously, certainitems dominated either of the highschool studentsconcerned, especially for more than or andif...then, whichtogether accounted judged qualityof half of the occurrences. And with the correlations hovering essays,these tinyfrequencies had trait being a(non- around zero, withthe highest for any significant) -.11with ratedcreativity. -139- TABLE VII -4

DISCOVERED FithOACIES CF CCRRELATIVE CCNJUhCTIONS

Correlative Frequency. either...or 57 neither...nor. 17 both...and 26 not only...but also 7 not onlybut 8 if...then 44 although...still 3 although...yet 0 though...still 0 though ...yet 0 since...therefore 0

162 This particular investigation,then, explored one small facet of language usage in highschool essays. The hypothesis that correlative conjunctionsmight furnish additional clues to writing quality was notsupported by the data, but an algorithm wasdeveloped to permit the searching for separated words andword clusters in the text.

Verb constructions. Another investigator was in- terested in whether type of verb syntaxwould help predict essay quality. Thomas F. Breen pointed out that many textbook writers for compositionteaching inveigh against the use of the passive voice,and claim that the active voice is almost always to bepreferred (Gleason, 1965). But one would believe thatperfect tenses, since they differentiate time, would characterizebetter writing (Scott, 1960). Breen therefore developed an.algorithm which would identify and count uses of theperfect tenses and the passive voice. His strategy was to locatethe auxiliary verbs (forms of "have" or"be") and then look for a past participle (the algorithm searchedfor an -ed ending, or for membership on a list of213 irregular pastparticiples). Two general exceptions werenoted: If a form of "be" were followed by a relative pronoun,then by a past participle, it was not counted as apassive verb. (Example: "There were many people(who) sent gifts.") Similarly, if a form of "have" were followed bythe word "to," then by apast participle, it was not counted as aperfect form. (Ex- ample: "Someday you will have (to) comehere.")

With the algorithm so developed,he found 367 occur- rences of the perfecttenses, and 1323 occurrencesof the passive voice. Of these latter, 135 werebelieved to be acCompanied by a possible agentof the passive verb, aform generally regarded as worsethan passive verbs not accompany- ing such explicit agents.

-141- In general, Breen's hypotheseswere not supported by the data. When the raw frequencies ofsuch occurrences were correlated with the overall qualityof essay, perfect tenses had a mere .03 relation. Passive voiceoccurrences had a correlation of .28 withessay grade, contrary to the prediction.And passive voiceoccurrences together with an agent had a correlation of .13 withessay grade. Un- fortunately, the investigator didnot control for essay length, which would expectablybe correlated with these occurrences, and the discovered correlationsare therefore harder to interpret than they mightotherwise be. For example, if passiveoccurrences have a high correlation with essay length, andas we know essay length has a sub- stantial correlation withessay quality, then the apparent correlation of passiveoccurrences with essay quality might be an illusion, and the meaningfulcorrelation of the two variables might in fact bezero. And there are other possible third variables which wouldaccount for the appar- ent anomalies in the results. In Any case, the project is turning toward a deeper syntacticanalysis, as will be described in a later chapter.

Parenthetical Expressions. The final substudy described in this chapter was conducted againby Donald Marcotte. Parenthetical expressionsare frequently used asides in writing. When properly employed, theyare effective de- vices even though they do not contributemeasurably to the over-all meaning of the sentence. The object of this sec- tion of the study is to determinewhether the students used parenthetical expressions, and whether theyused them judiciously. If so, then correlations between grades given on style and use of parenthetical expressions should be significant, and students using parentheticalexpressions should receive higher grades than thosenot using them. -142- To test these hypotheses, we must first be able to identify a parenthetical expression. Fortunately, a parenthetical expression has two identifying features. The first identifying feature is its required punctuation. For example, Warriner and Griffith (1957, p. 580) state that "If he [the writer] wishes the reader to pause, to regard the expression as parenthetical, he sets it off; if not, he leaves it unpunctuated." Three types of punctua- tion marks are used in "setting off" the expression: commas, parentheses, and dashes.

The second identifying feature of the parenthetical expression is its placement in the sentence. According to Summey (1949, p. 60), there are three positions: "...(1) preliminaries, standing at the beginning of sentencesor sentence members, (2) parenthetical groups in intermediate positions--commonly called parenthetical expressions with further qualification, and (3) tags or end parentheses."

With these two discernible cues, punctuation and position, and with a dictionary of parenthetical expressions, the computer can be programmed to identify these expressions in essays. The computer's dictionary consisted of 94 parenthetical expressions obtained from the textbook sources cited earlier, and from the opinionation-vagueness list al- ready described.

Correlations and t-tests were used by Marcotte in the statistical analysis. First, correlations were computed to determine the relation between position of expression and grade given on style, and the relation between propor- tion of expressions used to number of sentences and grade given on style. Second, t-tests were used to determine if the group using parenthetical expressions received significantly higher grades on each of five traits (Ideas or'Content, Organization, Style, Mechanics and Creativity) than the group not using parenthetical expressions.

-143- Less than half(n=112) of the studentsused paren- program dic- thetical expressionscontained in the computer expressions found, 132 wereused to tionary. Of the 216 within sentences, introduce sentences,sixty-seven were used Also, commas and seventeen wereused to end sentences. expressions; the re- accounted for thepunctuation of 215 maining expression wasset apart byparentheses.

Table VII-5 consistsof a list ofidentified paren- like "also," thetical expressions. Evidently, words phrase "forexample" "however," "no,""therefore," and the arefavorite items. t-tests. Table VII-6shows the resultsof one-tail .01 level. All five comparisonsweresignificant at the for style, as wasexpected. However, thelargest t-value was Apparently, the useof parentheticalexpressions,.proper given on use of course,has some bearing onthe grades essays. in Table VII-7 showsthe correlationsbetween position the correlations the sentence andstyle. Also shown are expressions in the essay between proportionof number of and style. to number ofsentences in the essay all Except for theend position ofthe expression, the .01 or the.05 correlations aresignificant at either level. is as A summary ofthe Marcotteresults, then, follows: used parenthetical (a) One hundredtwelve students expressions. wereset-off (b) Two hundredfifteen expressions by commas. set-off byparentheses. (c) One expression was the expressions. (d) No dashes wereused to punctuate -144- TABLE VII .5

PARENTHETICAL EXPRESSIONS USED

Expression Frequency Beginning Within End

after all 1 1 0 0 all in all 1 1 0 o also 15 11 0 4 at least 1 o 1 o for example 17 14 2 1 for the most part 1 o 1 0 furthermore 1 1 o 0 generally 1 1 o 0 however 61 32 28 1 I am sure 2 0 1 1 I believe * 2 2 0 0 I suppose 1 0 1 0

I think 1 0 1 0 if possible 1 0 1 0 in addition 1 1 0 0 in conclusion 3 3 o- o in general 2 0 2 0 in ayopinion 8 6 2 0 it seems 1 o 1 0 likewise 2 1 1 0 maybe 1 1 0 0 more or less 1 0 1 0 moreover 1 1 0 0 nevertheless 3 3 o o no 15 10 0 5 obviously 1 1 o 0 of course 7 4 3 0 oh 5 5 0 o on the other hand...... a 3 5 o ordinarily 1 1 o o perhaps 4 1 2 1 probably 1 o 1 o sometimes...... 4 4 0 o still 1 1 0 0 that is 3 3 o o therefore 15 12 3 0 though 4 0 3 1 to be sure 2 0 2 0 too 6 0 3 3 usually 2 2 0 o well 7 6 1 o why 1 0 1 0

-145- TABLE IT:IL:-6

MEAN GRADE DIFFIRENCIZ BEIWEZN THE PARENTHETICAL AND NON-PARENTHETICALGROUPS

Difference .Probability, Traits betweenMeansa

Ideas or Content 1.94 3.05 z... .01 Organization 2.28 3.42 c 01 Style 2.43 4.02 401 Mechanics 2.71 3.57 4.01 Creativity 1.66 2.60 4..01

aParenthetical minus non-parenthetical.

-14 6- TABLE VII.?

CORRELATICNS BETWEEN

POSITIONS OF PARENTHETICAL EXPRESSIONS

ANP STILE

a Position

Totals

Beginning 0.516 0.863 0.23

Within 0.262 0.599 0.21 4.01

End 0.066 0.293 0.00 n.s.

Total 0.844 1.154 0.28 Proportions

Beginning/No. of Sent 0.023 0.039 0.19 c .01

Within/No. of Sent 0.013 0.031 0.11 05

EnO/No. of Sent 0.003 0.013-0.02 ne

Total/No. of Sent...... 0.038 0.054 0.20 a One-tail test. (e) Except for the end position, all correlations for position were significant. (f) Students using parenthetical expressions re- ceived significantly higher grades on all traits than those students not using the expressions.

Again, it is wise to note some reservations about these findings. It is probable that all of the reported differ- ences for parenthetical expressions might be affected some- what by essay length. If an expression occurs in one essay but does not occur in other, it is likely that the one in which it occurs is a longer essay than the one in which it does not occur. And this relationship could have influenced the significance levels of Table VII-6. The second half of Table VII-7 attempts to provide for this influence, by controlling for the number of sentences in the essays. Nevertheless, this is not a wholly satisfying control, since sentences containing parenthetical expressions might be presumed longer than sentences not containing such ex- pressions; and the factor of essay length might still be the major contributor to the observed relationship. Further multivariate study must be conducted to ascertain just how useful the discovery of these parenthetical expressions is going to be.

However, one portion of the present finding does not appear subject to this criticism of length, and is also very pleasing from an intuitive point of view. This is the contrast found, in the bottom half of Table VII-7, for the various positions of the parenthetical phrases. The correlation with quality of the proportion of beginning phrases is .19; of the within phrases is .11; and of the end phrases a (non-significant) -.02.This order coincides very nicely with the general view that end expressions are weak, dangling, and anti-climactic, and that middle ex-

-148- pressions too often interrupt and divide thesentence syntax. This work on parenthetical expressions deservesfurther attention.

Summary. This chapter has made a significantexten- sion in the facilities of essayanalysis, by introducing a powerful and convenient phrase look-upsubroutine, called PHRASE, written primarily by Marcotte,and tested with a number of sub-studies.One of these investigated the importance of cliches in predicting essayquality. It found that cliches were, first,surprisingly rare in occurrence in student papersand, second, quite inert in their apparent influence on ratingsby expert human judges. A second study also used thePHRASE algorithm, with some subjectively constructed dictionaries, toinvestigate hypothe- sized traits of opinionation, vagueness,and specificity in the same student essays. Although found to correlate in predicted directions with essay quality,these three traits did not, apparently, contributeimportant unique predictive variance to the ratings. Other studies reported herein- vestigated correlative conjunctionsand verb construction in an effort to find predictors of essayquality. And a final study showed a positiverelationship between writing quality and the use of parentheticalexpressions, and their position in the sentence whereused. In general, these uses of phraseprocedures had varying degreesof success in the search for the sourcesof essay quality, but to- gether they indicate the expandedutility of the essay analysis program. CHAPTER VIII

ON-LINE ANALYSIS AND FEEDBACK

As we have seen fromthe earlier chapters, this pro- ject has repeatedlydemonstrated that a computer canread a student's essayand return a numericalrating which in- dicates the quality of the essay on oneof a number of traits. These ratings have been found tobe as reliable as thoseassigned by trained humanjudges. Since the computer can return suchratings one might well ask,"What else can the computerreturn, given an essay asinput?Can the computer make commentsabout a student's essay,and if so, on what canthese comments be based?"

In an attempt to answerthese and similarquestions, a computer program wasdeveloped. This program, the inter- active essay grader, instructsthe computer to read astu- dent's essay, to make aseries of comments aboutthe essay, and to allow the student tocorrect some errorswhich the computer found, allin conversation mode.

We should make itclear at the very beginningthat this program is not to betaken as a model of expertpeda- gogy. The program reqqires muchrefinement before it can be used in a real schoolsituation. The primary purpose of the program is toillustrate some of thethings that can be done, andto reveal some of theproblems which were encountered in its development.Most of this workhas been carried out by Dieter Paulus,with some assistanceby Michael J. Zieky, and wasreported in much thepresent form to the AmericanPsychological Association(Paulus, 1967).

-150- Background. Since we are basically concerned with a simulation problem, simulating by computer the feedback an English teacher might provide for her students, a reasonable place to start would be in the examination of some of the comments an English teacher might make.

First, the teacher might look at the content of the essay to see whether the student has demonstrated an under- standing of the required concepts and a knowledge of the required facts. At present, no attempt was made to program the computer to comment on the content of the student's essay. (There is work now beginning in this field.)

Second, a teacher may look at the general structure of the essay. Here the teacher might be concerned with the soundness of the inferences a student draws, whether or not the mode of expression used by the student is appro- priate, where the essay lacks clarity, or where a point needs further support. Here again, no efforts were made in this program to allow the computer to deal with these areas.

A third aspect of an essay that a teacher may look at in a student's essay, and frequently this is the most important and most time consuming task in which an English teacher is involved in the teaching of elementary writing skills, is the judging of the appropriateness of the stu- dent's word usage, determining errors in declension, noting spelling errors, and so on. Comments relating to these areas are appropriately made if the writing of an essayis seen primarily as a drill exercise, and thestudent is asked to write many essays so that he may learn to avoid these errors. It is the comments a teacher writes relative to these types of errors that the present computer program attempts to simulate; for these comments are rather routine and take up much of the teacher's time and energy. If the comPuter can successfully take over this task, then it would be doing the teacher a tremendous service, as she could spend her time and energy in making comments of the first two types.

4, -4- 44V, .4:4 .4444,4,..-.1,444.444.4,.444&_,44c7474. The type of feedbackwhich this programattempts to simulate is of theprescriptive sort, commentsthat tell the student to avoidcertain usages and suggestscertain deviates too greatly alternatives. If the student's essay from the norm, thenthe computer indicatesto the student where potential problems maylie and suggestscorrective measues. grader is, with The prc,gram. The interactive essay the exception of oneshort subroutine,written entirely tele- in FORTRAN IV. The program waswritten on a remote cabel to M.I.T.'s type terminal,connected by telephone IBM 7094 computer. A student whowishes to use the pro- the pro- gram simplytypes a code word onthe console and point in the gram beginsto execute. At the appropriate execution of the program,the computer asksthe student to write the essayin natural language. The only 'restric- marks. tion imposed onthe student arespecial punctuation This is due to thelimited characterfacility of FORTRAN he is instruc- IV. When a subject hascompleted the essay, starts almost ted to type anasterisk. The computer then immediately to respondand to comment onthe student's essay. particular- As a language,of course, FORTRANIV is not Therefore, ly well suited fornatural-language computing. the program in itspresent form isrelatively inefficient computer requires and lacks elegance. Nevertheless, the and only about twelveseconds of machinetime to evaluate of course, to begin comment on anessay. Printing speed is, considerably slower. it may be For purposes ofdescribing the program, into five parts. conveniently, thoughartifically, divided These are (1) thegrading routine;(2) the prescriptive comments; (3) commentsbased on actuarialcharacteristics and of the essay;(4) the interactivespelling toutine; -152-

° 4.ke (5) the recalling and recording of data aboutthe essay. Each of these will be discussed in turn.

The program calculates a numerical grade for an essay by using a weighted sum of scores on eightvariables. These variables were selected by a step-wisemultiple re- gression process from the original 30 proxesused in the larger project. The eight variables which are included yield a multiple correlation coefficient ofapproximately .60 when used to estimate expert humanratings. The pro- gram takes thenumerical grade and selects an appropriate comment from a list of comments. If, for example, the grade is quite high, the computerwrites, "I think that you did quitewell. Keep up the good work!" On the other hand, a very low grade calls for the response,"I don't think that you did at all well. Are you taking this assign- ment seriously?" Intermediate grades call forother com- ments. (Incidentally, if a student tries to fool the com- puter and types nothing but nonsense,the computer responds, "Stop wasting my time! If you don't stop playing aroundI will report you to your teacher".) These comments are used instead of numerical scores becausethey are presumedly more meaningful to thestudent than, say,the number2.8634. If teachers usually had time towrite comments, they un- doubtedly always would. The number or letter gradealone is primarily designed to save time.

The prescriptive comments arecalled by a binary search phrase look-up subroutine which searchlists that have pre- viously been entered into thecomputer's memory. Michael Zieky was primarily responsible fordeveloping this portion of the program. The lists which can be searchedby the computer may be of almost anylength, limited only bythe size of the computer. Since search time is notdirectly proportional to the length of thelists, these lists can grow to greatlengths with only a trivial lossin computing -153- time. For example, to search a listcontaining 16,000 words requires only one more comparison bythe computer than to search a list of some 8,000 words. Hence the criticism as to why a particular word orphrase is not included inthe list is quickly dispelled bysimply including that word or phrase. The general classes of words andphrases included in this list and on which the computercomments are as follows: (1) Taboo words, such as "aint" or"busted"; (2) misuses of case, such as"theirselves" or "to who"; (3) useof "of" for "have", "could of" or"should of", for example; (4) noun- verb disagreement, forexample "both is" or "I are"; (5) misuse of homonyms, "theiris", for example; (6) vulgar idioms such as "somewheres" or"that there"; and (7) double negatives such as "can'thardry" or "don't scarcely". As indicated before, it is only theresearcher's knowledge and imagination that limitsthe classes and, number ofwords or phrases tobe included. If the computer finds animproper usage it prints a message. For example, if the word"irregardless" is en- countered by the computer,it responds, "'Irregardless'is actually a double negative. If you examine thefirst and last syllables you will seewhy.", or if the studentwrites "should of" the computer responds,"When we speak quickly the word 'have' oftensounds like 'of'. But itEhould never be written that way." If the student writes"buSted" the computer responds, "Do youreally think the pastparticiple of 'break' is 'busted' or were youjust being careless?"

Comments based upon actuarialcharacteristics of the essay areprinted whenever somecharacteristic of the essay deviates too greatly fromits normative use. For the time being, norms are based on asample of 256 essaysused in previous analyses and canreadily be changed as the type of essay changes. -154- These comments are generallystated so as to indicate to the student that there maybe a problem with given as- pects of his essay. The computer might say, forexample, "Your sentences seem longand complicated...." andask the student a question, or suggesthow the difficulty may be overcome. That is, the computerindicates that there may be a problemand suggests that thestudent check to see whether or not aproblem really exists.

The interactive spellingroutine again utilizes a which words are misspelled. The , binary search to determine present list includes some750 words. First the computer prints a list of the misspelledwords that it found, then it gives the student ancppportunity to correct them. The student is asked to spell agiven word correctly, andif he does so, the computerresponds "That is correct. Very good." If the student continuesto spell the word in- correctly, the computerfirst suggests that thestudent try again, then, if it is againincorrectly spelled, thathe look the word up in adictimary. If the student again makes an error, the computerfinally suggests thatthe student go and seek histeacher's help; then it goes onto the next word. The computer determineswhether or not the word is spelled correctlyby looking the word upin a list of correctly spelledwords corresponding tothose spelled incorrectly in the spellinglist.

There are several problemsinherent in thisprocedure. First, the list ofapproximately 750misipellings seems to be quite inadequate. This judgement isbased on the exam- ination of a glossary of thewords used in 256 essayswritten by high school students. It was discoveredthat only a fraction of the misspellingsfound in those essaysappeared on thespelling list. But again, thelist can be easily increased in length. Second, it is sometimesdifficult to determine whether or not aword should beincluded in the spelling list at all, since somecommonly misspelledwords

-155- correctly spell some other words. For example, if theword 'MUSSEL' is included asapossiblemisspelling for the word 'MUSCLE', then if the studentreally intended to spell 'Mussel' it will be counted as amisspelled word. A further problem involves a student whomisspells a word not in the anticipated manner, in aplural form for example. Then the computer will not recognizethe word as a misspel- ling. A partial solution to theseproblems may lie in the inclusion of an extensive dictionary of correctspellings along with given rules for formingplurals, possessives, etc. which may be applied by the computer. This approach is currently under investigation byFrancis Archambault.

The last part of the programdeals with the recording of data about a student's essayfor use in making comments on future essays bythat same.student and for reportingthe student's progress, or lack thereof, tohis teacher. These data are recorded on a disk and arealways available to the computer. The computer can,therefore, lookback to the student's previous performance and comparehis present per- formance to that. For example, when commenting on astu- dent's overall grade, the computer canadd, "You did much better than last time. Very good!", or, if thestudent makes a greater number ofgrammatical or word usage errors, the computer may comment,"With respect to.grammar andword usage you have doneconsiderably worse this time thanlast time". Similar comments are made when thetotal number of spelling errors are reported. If a student makes the same spelling error in two consecutive essays,the computer com- ments, "By the way, you madethissame error the lasttime that you wrote an essay for me. Please be more careful."

These data are also used toprovide the teacher with feedback about a student's essays. By typing the appro- priate code word and code number onthe teletype console,

-156- the teacher may obtain a summary of the grades onall of the essays a student has written, as well as other summary information about various errors the student has madein each essay that he has written. If the teacher wishes, she can obtain complete copies of essays anystudent has written.

Figure VIII-1 will show some sample computer output. The computer's comments are designated byordinary type. The student's essay and responses arein italics. The essay which the computerevaluated was written by Paulus to demonstrate some features of the program.

Summary. A computer program has been developed which provides limited feedback to both the student and to the teacher about a student's essay. Content and general stylistic considerations are not considered as yet, but the program now comments on some errors in grammar and word usage, as well as on certain actuarially derivedvari- ables. A great deal of flexibility is provided for in- creasing the number of errors which the computer can recog- nize and comment upon. The program is, as of now, untried in real classroom situations. But it does illustrate a number of interesting possibilities and indicates thatthis general area is a fruitful area for further research. The computer can provide relevant feedback to thestudent about his essays. This feedback can deal with a variety of topics and can take many different forms.

Future steps should eventually include therefinement of the spelling routine, the development oflonger and more comprehensive lists ofpossible errors, refining and increasing the number of comments, and re-writing theentire program in a more suitable language,such as PL/1. Teachers should also have greater control over commentsand when they are printed. Carl Helm's PROTRAN, or a similar language, appears to be admirablysuited for such a purpose.

-157- FIGURE VIII-1 (Note: The "student's" responses are in italics)

I AM AN IBM COMPUTER. WHAT IS YOUR NAAE PLEASE.Q JOHNNY H. DOE DID YOU EVER WRITE AN ESSAY FOR ME BEFORE.Q YES HELLO, JOHNNY H. DOE, IT IS NICE TO TALK TO YOU AGAIN. PLEASE TYPE THE CODE NUMBER ASSIGNED TO YOU THE LAST TIME WE TALKED. 89 THANK YOU. WOULD YOU PLEASE READ THE DIRECTIONS ON THE FRONT OF THE MACHINE NOW, AND BEGIN WHEN READY.

THE NAME OF MY FAVORIT TELEVISION PROGRAM IS BATMAN. I WATCH IT EVERY CHANCE I GET EXEPT LAST WEEK WHEN OUR TV SET WAS BUSTED. THEN I COULDN'T SEE IT.

THE WEEK BEFORE THAT THEY HAD A REALLY GOOD SHOW ON. THE CATWOMAN STOLE SOME JEWELERY SHE LIKED. SHE HID THE STUFF SOMEWHERES. I BELEIVE THAT IT WAS IN HER HIDE-OUT. BATMAN AND ROBIN AINT STUPID.S0 THEY SOON FOUND IT AND ALMOST CATCHED THE CATWOMAN. SHE GOT AWAY DURING THE FIGHT;THEY COULD HAVE FOUND HER LATER.BUT THEY DIDN'T WANT TO GIVE THEIRSELVES AWAY.

THEIR IS OTHER NEAT CROOKS ON THE PROGRAM TOO. LIKE THE PENGUIN THE RIDDLER AND THE JOKER.

THEY ARE ALWAYS TRYING TO GET THE DYNAMIC DUO, BUT THEY NEVER SEEM TO BE ABLE TO. *

HERE IS SOME INFORMATION ABOUT YOUR ESSAY.

**7094****7094****7094****7094****7094****7094****7094****

'I DON'T THINK THAT YOU DID VERY WELL. TRY HARDER NEXT TIME AND PAY CLOSER ATTENTION TOWHAT I TELL YOU. (YOU DIDN'T DO AS WELL AS YOU DID LAST TIME.)

ContinUed (IF YOU DON'T AGREE WITH THIS, OR WITH ANY OF MY OTHER COMMENTS, CHECK WITH YOUR TEACHER.)

IN RESPECT TO GRAMMAR AND WORD USAGE, YOU HAVE DONE ABOUT AS WELL THIS TIME AS YOU DID LAST TIME YOU WROTE AN ESSAY FOR ME.

DO YOU REALLY THINK THE PAST PARTICIPLE OF"BREAK" IS "BUSTED", OR WERE YOU JUST CARELESS.Q

"SOMEWHERES" IS AN EXAMPLE OF POOR SPEECH HABITS SHOWING IN YOUR WRITING. THE WORD SHOULD BE PRONOUNCED AND SPELLED WITHOUT THE FINAL "5".

I NOTICE YOU USED THE WORD "AINT". MANY PEOPLE STILL OBJECT TO IT. SEE IF YOUR TEACHER ALLOWS THAT WORD BEFORE YOU USE IT AGAIN.

SOME VERBS ARE IRREGULAR BECAUSE THEY FORM THEIR TENSES IN A STRANGE WAY. "CATCH" IS SUCH A VERB, AND ITS PAST TENSE SHOULD BE "CAUGHT", NOT"CATCHED".

THE WORD "THEIRSELVES" SHOULD BE REPLACED BY"THEMSELVES". SEE IF "THEIRSELVES" IS IN YOUR DICTIONARY.

I SEE THAT YOU HAVE USED "THEIR" FOLLOWED BY A VERB. THE CORRECT FORM IS "THERE". WORDS THAT .SOUND ALIKE BUT ARE SPELLEDDIFFEEENTLY ARE CALLED HOMONYMS. CAN YOU THINK OF ANY OTHERS.Q

YOUR ESSAY IS CONSIDERABLY SHORTER THAN THEASSIGNMENT REQUIRED. HAVE YOU FULLY DEVELOPED THE TOPIC.Q

YOUR SENTENCES SEEM SHORT AND CHOPPY. COMBINE SEVERAL USING APPROPRIATE CONNECTIVES SUCH AS'AND! OR 'OR', FOR EXAMPLE.

YOU USED 5 PARAGRAPHS.

WHY SO MANY.QTRY TO REORGANIZE THE ESSAY SOTHAT YOU DON'T USE AS MANY PARAGRAPHS. USUALLY, A PARAGRAPH SHOULD CONSIST OF SEVERAL SENTENCES, AND SHOULDPRESENT ONE COMPLETE UNIT OF THOUGHT.

I NOTICED THAT YOU DIDN'T USE VERY MANYCOMMAS. DID YOU SEPARATE ITEMS IN A SERIES BY COMMAS, USECOMMAS IN COMPOUND SENTENCES, USE COMMAS AFTERINTRODUCTORY PHRASES.Q PERHAPS YOU HAD BETTER GO BACK TO YOURESSAY AND CHECK THOSE THINGS.

Continued

-159 - I AM NOT CERTAIN BUT I THINK THAT YOU MADE 4 SPELLING ERROR(S).

THAT IS A GREATER NUMBER OF ERRORSTHAN YOU MADE IN YOUR LAST ESSAY. PLEASE BE MORE CAREFUL.X

HERE IS A LIST OF YOUR SPELLINGERRORS.

FAVORIT EXEPT JEWELERY BELEIVE (THERE MAY BE OTHER ERRORS THAT IDID NOT FIND.)

LET'S SEE IF WE CAN CORRECT THEERRORS I FOUND.

LOOK AT THE FOLLOWING WORD.

FAVORIT PLEASE TRY TO SPELL THE WORDCORRECTLY.

(TYPE ON THE NEXT LINE AND PUSH'RETURN' ONE TIME.)

FAVORITE

THAT IS CORRECT. VERY GOOD.

LET'S GO ON TO THE NEXT WORD.

LOOK AT THE FOLLOWING WORD.

EXEPT PLEASE TRY TO SPELL THE WORDCORRECTLY.

EXCEPT

THAT IS CORRECT. VERY GOOD.

LET'S GO ON TO THE NEXT WORD.

LOOK AT THE FOLLOWING WORD.

JEWELERY PLEASE TRY TO SPELL THE WORDCORRECTLY.

JEWLERY

Continued NO, THAT IS 'STILL NOTCORRECT. WOULD YOU PLEASE TRYAGAIN. JUWELERY

I AM SORRY. THAT IS NOT CORRECT. WOULD YOU PLEASE LOOKTHE WORD UP IN THE DICTIONARYAND TRY AGAIN.Q

JEWELRY

THAT IS CORRECT. VERY GOOD.

LET'S GO ON TO THE NEXTWORD.

LOOK AT THE FOLLOWING WORD.

BELEIVE

(BY THE WAY, YOU MADE THISSAME ERROR THE LAST TIME THAT YOU WROTE AN ESSAY FOR ME. PLEASE BE MORE CAREFUL.X)

PLEASE TRY TO SPELL THEWORD CORRECTLY. BELIEVE

THAT IS CORRECT. VERY GOOD.

WELL, I HOPE THATTAKES CARE OF THE SPELLINGERRORS. PLEASE TRY NOT TOMAKE THE SAME ONES AGAIN.

**7094****7094****7094****7094****7094****7094****7094****

WELL, JOHNNY H. DOE, IT WASNICE TO TALK TO YOU AND TO READ YOURESSAY,

I HOPE THAT YOU WILL COMEBACK SOON TO WRITE ANOTHER ONE. MEANWHILE, PLEASE THINK ABOUT WHAT ITOLD YOU. GOOD BYE.

**7094****7094****7094****7094****7094****7094****7094**** **7094****7094****7094****7094****7094****7094****7094****

PLEASE NOTE YOUR NEW CODE NUMBER WHICHIS 90. THANK YOU.

DO YOU WANT TO WRITE ANOTHER ESSAY NOW.Q PLEASE ANSWER 'YES'OR 'NOV. (NO.BLANKS.) NO

EXIT CALLED. .-161- problems are Even though muchwork remains, and many analyzer designed as vetunsolved, theinteractive essay in a by Paulus seems tohave opened the doorto research relatively new aspectof computerassisted instruction, instruction that allowsthe an aspectof computer assisted computer to assume agreater role thanthat of a "mechan- begins to understand ized scrambled book". The computer what it is told bythe student and isable to intelli- gently respond tohim. Such on-line workshould eventually become an important areaof application.

-162- CHAPTER IX

CONCLUSIONS AND IMPLICATIONS

Chapters I through VIII have discussed rationale, methods of empirical research, and various findings from the work to date. However, the investigators have recog- nized from the beginning the extreme newness of this study, and its vast potentialities for the future of educa- tional measurement and instruction. Consequently, part of the original charge of this project was to scanthe field constantly for new opportunities of research and practice. Some of the recognized opportunites will grow rather directly out of the work so far accomplishedwith- in the project, but others will stem from sythesis with other work in related fields. Therefore this chapter will perform three functions: (1) It will summarize the pre- ceding chapters and the major line of work withinthis project. (2) It will discuss work in tangential fields, and the general status of the disciplinaryinterface most appropriate to the future of essay analysis. (3) It will point out some appropriate directions for futurework within the field of educational measurement andinstruc- tion, future work which may be closelyrelated to this project.

-163- 1. Summary of Work Completed

Rationale. The basic strategies of the computer analysis of essays have all grown outof an attempted simulation of human ratings. The fundamental approach has been to seek a goal ofautomatic analysis of stylis- tic qualities in essays, and thetechniques have been generally actuarial. That is, we have looked for asimu- lation of human, expert judgment ofintrinsic qualities (trins), through an exploration ofcorrelated, or approxi- mate variables (proxes),which could be made logistically available for computer measurement.

When this general strategy wasdecided upon, there were variousproblems which needed to be solved: The sub'ects have largely consistedof Wisconsin High School students who, in 1962, wrote aseries of essays under con- trolled conditions. (There have been other subjects not so intensivelystudied.) There was abundant information about the Wisconsin students. The data to be analyzed for proxes consisted of varioussets of essays writtenby these students, as key-punchedliteratim for computer in- put. The criterion for sucesssin computer strategy has consisted of the trins of experthuman judges, first rat- ings for overall quality generatedby four judges for each essay, and laterratings for ideas, organization,style, mechanics, and creativity generatedby eight (different and independent) judges for each essay.

The proxes themselves consistedof various computer measurements hypothesized tohave a potentialrelationship to the trins sought after. Some of these werestatistical counts relating to lengthwithin the essay, and others were measures of types ofwords used. Still others investigated characteristic of sentence openings orother structures. Thirty proxes, which were mostextensively explored,largely treated single words as units. Later proxes have-treated

-164- All various patterns ofphrases, both intactand separated. of these proxes werestudied for possiblecorrelation with the trins of essayquality, either inbivariate or multi- predict trins variate relationships,and their ability to empirical work to date, is in some waysthe backbone of the just as the developmentof the rationale,and of the strategies used, is various programmingand statistical the backbone ofthe methodologicalwork to date.

Findings. Chapter III specifiedhypotheses about certain of the proxes,and described thecomputer program, (called PEG, listedin Appendix A), with someof its reliability features. Chapter IV exploredthe questions of and validity of the proxes,and showed theability of the computer strategy topredict the overallrating of quality about as well asthe-average human judge. It also dis- cussed some of the waysin which the computer maybe superior to the judge: especially in adjustingthe "severity" and thedispersion of the gradingsystem accord- On two sets ing to any uniform,predetermined standard. able to reachmultiple- of essays, thecomputer program was regression coefficientsof .71. Also, one essay's proxes written were able topredict the judgmentsof other essays A conservative by the samestudent, to a MULTRof .62. cross-validation of the programshowed the abilityto indistinguish- generate largenumbers of ratingswhich were con- able fram those ofthe human judge. In sum, the proxes directions, to tributed significantly,in the predicted And produce quitehumanoidaratings ofoverall quality. such multi- the Paulus tables wereconvenient tools for variate analysis. Chapter V made amajor expansionin the Program, by profile of scores. moving the simulationstrategies to a English teachers, The human ratings werethose of 32 expert 256 essays onfive with eight judgesevaluating each of

-165- each spelled outcarefully major traits ofwriting quality, judges according to accepteddimensions. The individual with each other,but werefound to correlateonly weakly i.e., to great there was a strongtendency to ahalo effect, judged by anygiven uniformity of profilefor any given essay sufficient profileconsensus rater. However, there wasa trait by essay. The for a singificantinteraction of differentially to thefive traitsand, proxescontributed relationships shown: halo aside,there wereinteresting contributed highlyto content, For example,length of essay at all tomechanics. organization, andcreativity, but not support forthe validity There was thusintuitive mutual of the ratingsand of thecomputer system. showed-coeffi- Theintercorrelations ofthe traits than thereliability of cients which wereactually higher surprising findingbut an under- the individualtraits, a and the rela- standable onein view of thehalo tendency, Some effort was tive independenceof thereliability. into a purer made to clustercommonjudge viewpoints of criterion, for purposeof simulation,and implications interestingcomparison this work werediscussed. A most relative abilityof the computer of thischapter was the Although human program tosimulate thevarious traits. judging mechanicsthan judges were much morereliable in less reliablein in judging anyother trait,and somewhat programdisplayed no such judging creativity,the computer subjective, handicap, anddid as wellwith the more "qualitative"dimensions as with any. problem of non- Chapter VI made somestudies of the simulation. linearity ofprediction in suchmultivariate odd ones,and Clearly, some ofthe ?roxdistributions were with thecriterion, their relationswith eachother, and of correctionexplored were wereirregular. The two methods the proxes. For interaction termsandtransformations of successful inincteasing various reasons,these were not -166- the overall cross-validated multiple regression,and for practical purposes the linear assumption remained a power- ful and useful one, even where it was not exactly true. Some useful programs were developed fordisplaying bivariate relationships and for modifying variables systematically.

Chapter VII expanded the work of the computer pro- gramming to analysis of text strings of more than one word in length. A phrase lookup algorithm was listed as an adjunct to the main program, and wasused in a number of sub-studies. One of these explored the essays for the presence of standard clichephrases. It did not find them in common or injurious use, and where they did occurtheir presence seemed uncorrelatedwith essay quality. Another substudy used the same algorithm to locate phrasesbe- lieved to characterize student traits ofopinionation, vagueness, or specificity. As predicted, the first two were found negativelycorrelated, the last positively correlated, with essay quality, but thesignificance could probably be accounted for by third variables ofword common- ness whichdistinguished the lists. Other substudies found null relationships between essay qualityand correlative conjunctions (for one investigator) and verb voiceand tense (for another). One significant study alsoused the phrase algorithm to examine parentheticalexpressions, and found them indeed, as might bepredicted, related to essay quality according to whether they occurred atthe beginning (good), middle (less good), and end(perhaps poor) of a sentence. Such phrase lookup thus represented a step up- wards in the power of the analysis program.

Finally, Chapter VIII implemented anon-line, inter- active program to demonstrate the potentialpractical uses of such a systemfor eventual classroomapplications. The program works at a time-sharingconsole, and is written in FORTRAN IV, like the other programshere reported. It greets the student and defines the essayassignment.When the student has finished his essay andsignaled his comple-

-167-

- tion, the computer (IB17094) begins in about 10 seconds with diagnosis, evaluation, drill, and advice. The algorithms were largely ad hoc and specific to certain narrow classes of errors. Much basic work is needed for a truly flexible system. Yet the program should help demonstrate that there is nothing in principle about the computer which will prevent a vast range of essay-analyzing applications in the future.

In short, the chapters up to this point have described the actuarial rationale, the deliberate limiting of focus, the implementation of computer algorithms, the construc- tion of suitable criteria, and the empirical results of the current state of the art of automatic essay analysis. These chapters have also explored some stat3-+-4.cal possi- bilities, various additional proxes, and some on-line token implementations in simulated settings. The remainder of this final chapter will consider certain additional possibilities of interest in the work of contemporary scholars, and will point some possible directions for the most .promising future investigation of the lines herebe- gun. 2. Some Work Related to theProject

Since the inception of Project EssayGrade, much work has gone on in areas related tothe project. The investi- gators have made additionalexplorations into related dis- ciplines, and have kept constantcontact with them. For future investigators inautomatic essay analysis, some knowledge of this outside butrelated work is essential, if they are to avoid theterrible expenses of redundance or ignorance. Therefore, this section willbriefly describe some of thisrelated work.

Journals. The related disciplinescontinue to grow rapidly in activity. Two journals have appearedwhich capitalize on the potential relevanceof computation for language processing intraditional scholarship. One of these is Computers and theHumanities, since 1966 a quarterly edited by Joseph Raben at QueensCollege. A larger quarterly is coming out in early 1968,Computer Studies in theHumani- ties and Verbal Behavior,publishedby Mouton Press with an interdisciplinary editorial committee. (The first author here is the editorialadvisor for education.) And The Journal for Educational DataProcessing shows interest in natural language.

Societies. Organizationally, a great dealis happening. The Association for EducationalData Systems (AEDS)is only peripherally interested innatural language, but itsinvolve- meat seems to beincreasing. The Association forComputing Machinery (ACM), a veryvigorous and strongorganization of computer scientists numbering over20,000, has a great deal of interest in relevantfields. It has a specialinterest committee for artificialintelligence (SICART), whichis changing to established groupstatus, and a groupfor in- formation retrieval (SIGIR). And it has a newlyforming committee for languageanalysis and studies in thehuman- ities (SICLASH) which hasalready a substantialinitial membership. The American DocumentationInstitute (ADI) has -169- recently changed its name to the American Society for Information Science (ASIS), and has a keen interest in many areas overlapping this project. All of these socie- ties put out newsletters, journals, or both. Perhaps the most acutely relevant body is the Association for Machine Translation and Computational Linguistics (AMTCL), which publishes its own journal and a useful newsletter called The Finite String. This group holds its own meetings in conjunction with ACM and the Linguistic Society of America, and has participated in two international conferences in the field.

The oldest societies within the humanities, such as the Modern Language Association (MLA), are notoriously tradition-bound, but even in the MLA a computer group is establishing a fairly permanent event at the Annual Meeting.

Besides AEDS, the educational and behavioral societies have indicated a growing interest. The pre-session train- ing conferences held before the Annual Meeting of the American Educational Research Association (AERA) have been stimulating more sophisticated computer strategies for some years (with sponsorship from the UnitedStates Office of Education). These have been increasingly oriented toward interactive work, especially in CAI, which has strong- ly overlapping interests with natural-language analysis. And in 1968 we conducted the first such workshop entirely concerned with natural-language analysis for educational research.

Textbooks. A discipline has difficulty in growing rapidly until authors have defined it in suituable text- books. There are a number of such books which bear onthis work, thOugh none is currently satisfactory for most courses which are being conceived. Works edited by Garvin (1963) and by Feigenbaum and Feldman (1964) have beenmentioned earlier in this report, and so has the olderone.authored -170- by Oettinger (1960) onmachine translation. An excellent related work is that by Beckerand Hayes (1963) oninfor- mation storage and retrieval.

New arrivals include arather descriptive bookin the humanities, editedby Bowles (1967),and an important work in Automatic LanguageProcessing edited by Borko(1967) One of the most usefulworks, though notreadable by any but professionals, is a newbook in computationallinguis- tics by Hays (1967). A forthcoming work oncomputers in education, written by Allan B.Ellis, will surely feature work somenatural-language work. And another forthcoming by Gerard Salton oninformation retrieval(due in 1968) should be valuable to someworkers in natural-language analysis. A text by Veldman(1967) on FORTRAN programming for behavioral scientists,has one chapter onverbal data which should prove veryuseful.

In general, materialssuitable for instructingin essay analysis canbe pieced togetherfrom such works as these, various programmingtexts, works onstatistics and on linguistics. But the field stilllacks a suitable synthesis textbook for allintroductory purposes,and work may proceedwithout it for some-time.

Other books. On the other hand,books which have some growing more distantbearing on natural-languageseem to be rapidly in number andquality, and shouldreceive at least brief mention. In theories of automata,the growth has been espe.3ia1ly brisk. Robert Korfhage (1966)has produced a book which relatescomputation to recentand current acti- vities in mathematicallogic, and theproduction languages described have high relevanceto context free grammarsana, indeed, to the basicoptimism about whatcomputers may accomplish. Marvin Minsky's book(1967) will surely open the field of computationtheory to many personswho would otherwise not have made contactwith it, and shouldthereby produce indirectly muchimportant practicaland theoretical produced work. And Taylor Booth(1967) has unquestionably the most impressivecompendium on automatatheory so far. -171- Such activity has been going on before now, but has only recently surfaced in such organized forms. In the field called "artificial intelligence," we have already seen that activity is growing with computer science. Carne (1965) has one attempted synthesis of some central con- cepts, and other, larger works are reportedlyin prepara- tion. At first glance, such works may seemirrelevant to natural language processing, but the presentwriters do not believe that they are. Rather, they serve to change the way that computers are regarded, alteringtheir image from that of a slavish, pedestrian worker tothat of a universal machine. This seems to us a very important and necessary change in thebehavioral applications of computer science.

Recent related work. Earlier portions of this report discussed some related work in other disciplines. This section will comment on some recent lines of suchdevelop- ment, which seem particularly meaningful. This will not attempt a complete coverage of such work,but will only indicate a few of what may be major lines ofrelated in- vestigation, over a longer period.

We have said that the work of ProjectEssay has so far been actuarial in nature, leaning onstatistical rela- tions between prox and trin more than ondeterministic strategies. Such statistical strategies should notbe underrated. As Sapir has written,"All grammarsleak." No matter how the future of such workdevelops, it is hard to foresee a time when serioussimulation will dispense with a large probabilistic element. Yet Project Essay wishes to push ahead with the deeperlinguistic and psycho- logical dimensions as well, and to takemaximum advantage of.any developments in these areas.

-172-

_ r- Parsing. In the linguistic world,there are certain lines of investigationwhich seem very promising. One of these is in context-freeand other parsingschemes aimed the at syntacticanalysis. Of all parsers constructed, most realistic one sofar is the Oettinger-Kunomultiple- path predictive parserat the Harvard AikenLaboratory.

The nature of currentparsing systems isdescribed in a number ofplaces (Garvin, 1963, pp.223-232; Bobrow, 1967; Hays, 1967, ch. 6),and will not beexplained here in any detail. Since Kuno's programenjoyed such eminence, we were veryinterested in possibleapplications, and Pro- fessor Kuno kindlyarranged for parsing 50sentences from the Wisconsin essays. The results of thisprocessing will be briefly set forth,and illustrated. Multiple Path System. In order to usethe Kuno parsing system, everyword of the text mustbe found in a "dictionary" that is, a list ofwords accompanied by their possible syntacticroles, encoded in a waythat is useful to the system.-The ordinary"noun" or "verb" is not sufficient; there arevarious restraints onwords which are not adequatelydescribed by such broaddesigna- tions, and thereforesuch dictionariesneed painstaking limited, construction. The Harvard dictionaryis still quite and some of the commonstudent words neededto be supplied (as did all misspellings). Figure IX-1 shows thexesult of looking upthe words This of one student sentencein this specialdictionary. sentence was:Moneybecomes a hindrancewhen it ceases to aid in the attainmentof one of the bestthings and becomes ambiguities in the a goalitself. Figure IX71 shows many possible syntactic rolesto be played bymost of the words of.this sentence. Only of and andpresented no homographs, for "the" and aid possessed sevenhomographs to compete correct parsing.

-173- FIGURE II-1

comma LISTING OF HOMOGRAPHS.FROM THE PARSING DICTIONARY FOR ASTUDENT SENTENCE 01 SENTRICE NUMBER 000024 CORPUS NUMBER

WORD HOMOGRAPHS

MU NNNS MIMS NOUS BECOM VI23 VI35 VT1S

A AAA ART

HINDRANCE NNNS MIMS NOUS

WIWI IAV CO2 RL6

IT TITS PRNS PRC

CEASES VT18 VI1S

TO TOIS PRE

ENNS MMt4S NOUS AID VT1P IT1 VI1P 111

IN PRE AV2

THE AAA ART

ATTAINMMT NOS MIS NOUS

OF PRE

ONE NNNS INIS NUNS

OF PRE

THE AAA ART

AV1 AAA ADJ BEST NNNC 14MIC NOVC

THINGS NNNP nue NOUP

AKD EC)

BECOMES VI25 VI35 VT13

A AAA ART

GOAL NNNS MKS Nous

ITSELF PRO An

PHD

-174- The next illustration,Figure IX-2, shows thefirst To the parse performedby the Kunopredictive algorithm. parsing may seem scholar unfamiliarwith such work, this intelligence, for there a surprisingexample of artificial correspond with the is a great deal aboutit which would analysis of a trainedstudent of rhetoric. The first column is of coursethe list of "terminal"symbols, i.e., the words of themanifest English sentence. The second will column is the"sentence structure." A little study give some clue tothe way this may beread. All words fall within thesentence "1", and wefind the number 1 standing as a simplesubject, throughout. The word money, is only one syntacticstep from theterminal representation, and therefore wefind only "lS" forstructural designation. and On the otherhand, the word amodifies hindrance, hindrance has thestructural representation"1C" (where (or "adjective") C stan(9z for"complement"). Thus the article rela- a carriesthe designation"lCA". _By such dependency These tionships we have the12-symbol depth of theand best. object of the pre- words both modifythings, which is the position of, whichleads the prepositionalphrase which modifies the noun one,and so on back tothe adverb clause second headed by when,which modifies theverb becomes, the column, one could word of the sentence. From the second thus draw a treediagram of the sentencesyntax.

The third columnshows the particularsyntactic cate- A glance gory of thatword for thisparticular parsing. entries in this back to Figure IX-1will show that all out- column appeared aspossible homographsin the earlier description of what put. And-the fourth columnis a verbal then, dependscomplete- that category. is. The fourth column, ly on the third. -175- FIGURE IX-2

FIRST COMPUTER ANALYSIS OFSYNTAX OF A STUDENT SEiTENCE

Multiple-Path Syntactic Analyzer NOTE: Analysis produced by the Kuno CORPUS NUMBER 01 (..;1i148;. ANALYSIS NUilBER 1 SMTEICE NUMBER 000024 SYNTACTIC ROLE RL NUM PkrAiICTION POOL FEGLISH SlaINCE STRUCTURESdC SWC MNEXONIC SE SUBJECT OF PREDICATE, VERB SENNNO NOUS NOUN 1 hONFX 15 PD VSA VI2S ADJ-COMPLEMENT VI PhEDICATE VEhB VXVI21 BECO.-.ES 1V PD VSAZENN3A N3AAA0 PRO-ADJECTIVE COMPLEMENT OF PREDICATE, V A ICA ART FD VSAZMNN6A N610010 NOUN COMPLEMNT OF PREDICATE V HINDRANCE lc NCUS FD VSAZIOI CMCO22 ADVERB CONJ 1 CCNJUNCTION KEEN 18R CO2 FL VSACMNVZG1ZA PERSONAL PRN NOM SUBJECT OF PREDICATE VERB 1X,FRNO IT 185 PRNS PD VSACENVSG V1VT11 VT 15 NOUN-OBJECT VT PREDICATE VERB CEASES 18V PD VSACKNN2A N2T0I0 TOIS TO FOR INFINITIVE OBJECT INFINITIVE TC 18UVR PD VSACMNBVF BVIT10 INFINITE VTI OBJECT INFINITIVE t34 AID 180V IT1 PD VSACENN2F N2PREO PREPOSITION PREPOSITION IL 180VPR PRE PD VSAChNN2FKG OBJECT OF PREPOSITION NQAAA0 THE 180VP0A ART PRO-ADJECTIVE PD VSAOMNN2FN5G OBJECT OF PREPOSITION N5MEMO AT1AINMENT 180VP0 NCUS NOUN 1 PD VSACY1W2F PREPOSITION KURE° OF 180VPOPR PRE PREPOSITION PD VSACE2JN2FNO NQNNNO NUMERAL OBJECT OF PREPOSITIoN 180VPOPO NUMB PD VSACENN2F PREPOSITION N2PREO OF 180VPOPOPR PRE PREPaSITION PD VSACENN2ENG NQAAAO PRO-ADJECTIVE OBJECT OF PREPOSITION 180VFCF0720A ART PD VSAC1,NN2FN5G OEJECT OF PREPOSITION N5MM14O BiST 180VPOPOPO NOVONOUN3 PD VSACMNN2F OBJECT Of OBJECT INFINITIVEX2INNO 7::Dics /800 NOUPNOUN I PL VSACEN CMICOO COORDINATE CONJ1 COMPCUIZ PREDICATE VERB AKE 1+ xCO PD VSA PREDICATE; VERB VXVI21 bLCCEIS 1V VI2S ADJ-COMEP,INT VI PL N3A COMPLEMENT OF PREDICATE, V N3AAA0 A 1CA ART PRO-ADJECTIVE PD N6A COMPLEMENT OF PREDICATE V N6MYY0 GOAL 1C NCUS NOUN 1 PD PDAV10 ADVERB 1 ADVERB ITSELF AV1 PD END OF SENTENCE PDPRDO 1. PRD PERIOD The fifth column, however, depends also onthe actual sentence structure as diagnosed by the computerprogram. That is, it depends on what rules of thecontext-free grammar, what permissiblegrammatical constructions, were employed in order to yield this successfulparsing of the sentence. And the final columns have to do withthe way the parsing is carried out, with thepush-down store opera- ting at each step of the way.

A continuing analysis ofthis parsing will,unfortunately, show that it does not completely matchone's intuitive analysis of the later portions of thesentence. The second column indicates that becomes (fourthword from the end) is taken to be parallel with becomes(second word of the sen- tence). That is, it is taken to be part of acompound pred- icate of the word money. But most of us would takethis word to be part of a compoundpredicate of the word it (sixth word in the sentence). The distinction, from the standpoint of "meaning," is not a trivial oneat all. The way this parsing"reads" the sentence is (in reducedform):

Money becomes a hindrance. . . and becomes a goal itself. Whether the distinction would beimportant or trivial for a particular analysiswould, however, depend on theempiri- cal situation. A variant parsing appears inthe next illustration, Figure IX-3. This was the twenty-fourth"successful" parsing of this sentence, and shows anumber of changes from the first one. We see that becomes(fourth from the end) is here diagnosed as parallelwith ceases, as it should be, and therefore is part ofthe compound predicate of it. (There is a rather subtle changein another way here, however., in the diagnosisof role of the infinitive to aid.)

-177- FIGURE IX-3

LATER COMPUTER ANALYSLSOF SEiTAX OF A STUDENT SENTENCE CORPUS NUMBER 01 siNTENCE NUMBER 000024 .4;m243,ANALYSIo NUMBER 24 RL NUM PREAJICTIONPOOL S4C MNEMONIC SYNTACTIC ROLE LiGLIH SENTENCE STRUCTURE SE SUBJECT CF PREDICATEVERB SENNNO NOUSNOUN 1 MCNEY 1S PD VSA VERB VXVI21 VI2SADJ-CCMPLEMENT VI PREDICATE BECOMES 1V PD N3A COMPLEMENT OF PREDICATE;V N3AAA0 ART PRO-ADJECTIVE A 1CA PD N6A CORPLERINT OF PREDICATEV N6HEHO NOUSNOUN 1 HINDRANCE 1C PD PDCO25 1 CONJUNCTION CO2 ADVERB CONJ PD SGG "AEI. 18R SUBJECT OF PREDICATEVERB SGPRNO FRNSPERSONAL FRN NOR PD VSG IT 186 MEDICATE VERB VXVI10 VI1SCOMPLETE VI PD VSGZhN CLASBs 18V ADVERBIAL INFINITIVE CMTOIO TOISTO FOR DIFINITIVE TO 18VDVR PD VSGCHNEVM ADVERBIAL INFINITIVE ENII10 II1 INFINITE VII PD VSGCHN AID 18TAW CHPREO PREPOSITION PRE PREPOSITION PD VSGCHINQG IN 18VDVPR OBJECT OF PREPOSITION NQAAAO ART PRO-ADJECTIVE PD VSGCNNN5G THE 18VDVP0A OBJE3T OF PREPOSITION N5HMMO NOUSNOUN 1 ATTAINMENT 18VDVFO PD VSGCMN ChPREO PRE PREPOSITION PREPOSITION PD VSGCMNNQG OF 18VDVPOPR OBJECT OF'PREPOSITION NQNNNO ONE 18VDVPOPO NUMBNUMERAL PD VSGCMN CMPREO PREPOSITION PREPOSITION OF 18VDVPOPOPh PhE PD VSGCENNG OBJECT OF PREPOSITION NQAAAO 1HE 18VENPOPCP0A ART PRO-ADJECTIVE PD VSGCHNN5G N5ADJO ADJECTIVE 1 OBJECT OF PREPOSITION BEST 18VDVPOPOPOA ADJ PD VSGCHNN5G N5MMMO NOUN 1 OBJECT OF PREPOSITION THINGS 18VDVPOPOPO NCOP PD VSGCHN CH3C00 COORDINATE CONJ1 COMPOUND PREDICATE; VERB AND 184 XCO PD VSG ADJ-CORPLEMENT VI PREDICATEVERB VIVI21 BECOUS 18V VI25 PD N3A CCAPLEKEST OF PREDICATE V N3AAA0 A 18CA ART PRO-ADJECTIVE PD N6A N6MMMO NOUN 1 COMPLEMENT OF PREDICATE V GOAL 18C NOUS PD PDAV10 ADVERB 1 ADVERB ITSELF 18D AV1 PD END OF SENTENCE FORDO 1. PRD PERIOD Parsing went on and on,until there were108 parsings of this sentencealone (a very highnumber in the present picking trials). The system has no wayof automatically The know- the "right" parsingfrom among thecompetitors. ledge about the worldand about languagehabit which in- forms our own analysishas no presentanalogue in the This is just serious and large-scaleparsing programs. with the trouble withthe present parsingsystems, and pointed out in anin- present linguisticknowledge, as was vited address byAnthony Oettinger tothe 1967 Meeting of the AmericanDocumentation Institute. But incomplete as ourknowledge is, suchanalysis A great may stillhave much diagnosticinterest and value. pursued in such many branchesof the parsing tree are searches may have attempts, andinformation from these show some of the statistical value. Figures IX-4 and IX-5 by the Kuno statistical informationwhich is produced diagnosis algorithm. Such information maybe useful for of student errors,but an explanationof this possibility would take more spacehere than wouldbe appropriate.

There may also beactuarial value inthe ability of the program to parse anygiven sentence. The 50 student by an Englishscholar sentences wereanalyzed independently and the (Michael J. Zieky) aswell as by theKuno program, Table IX-1. resulting two-waycontingency layoutis shown in judgements of In this table thecolumns representthe human "grammatical" the 50 sentences,whether they werebelieved grammatical, and or "notgrammatical." We see that29 were represent the 21 not so. On the otherhand; the rows for each ability of the programto find asuccessful parse that there were29 of the 50 sentences. We find here parse wasfound. successfully parsed,and 21 for which no and the We find a veryclear relationbetween the rows might columns of this table. In fact, ifthese sentences 'resulting be assumed to beindependent of oneanother, the

7179- BY THE MULTIPLE PATHSTATIST SYNTACTIC IC AL ANALYZER IN FOMAT ION PRODUCED (SYNTAX DIAGNOSIs) FIGURE nc-.4 WORDSYNTAX NO. DIAGNOSIS ENGLISH PATHS STILL ACTIVE GRAMEARSEARCHES SUFFICIENTBLOCK TABLE INSUFFICIENTADDRESSES FAILURESTOTAL TEST EXTRA SUBRULES -CONSIDERED 321 MONEYHINDRANCEABECOMES 13 05 4210 3 320 021 1110 02 465 WHENTOCEASESIT 132 2814 3 276 9054 9 2618lo 4 26 402 134 30 04 288114 4652 12110 1 98 AIDATTAINMENTTHEIN 2430 822848388104 2466578023802716 272 19081646 204724192 132192 30 13741086 208 62 20261810 672 0 13161514 THEOFONEBEST 4978286839022472 11706 995638603196 2502225820661098 1098 406610 0 3174372031961802 938 439246265284 21191817 ANDTHINGSABECOMES 455235421446 700 15858 433842006128 225418481960 296 boo112 28 5540 242836 76442520 296 242223 GOALITSELF TOTAL 30309 462354138108 75614 414924828108 19997 276288108306 3243 00 23306 612150198 0 30483 690 FIGURE IX-5

FURTHER STATISTICALINFORMATION PRODUCED BY THE MULTIPLE PATHSYNTACTIC ANALYZER (SYNTAX SUMMARY)

SYNTAX SUMMARY FORSENTENCENUMBER 000024 CORPUS NUMBER

SUMMARY OF PATH ELIMINATINGTEST FAILURES

TYPE OF TEST NUMBER OF FAILURES

FOOL OVERFLOWS 0

SHAPER OVERFLOWS 9108

NESTER OVERFLOWS 4198

NUMBER AGREKENT 0

CN 4518

XC/XD 868

CN/CM/ICAD 4302

PA 312

SELF-EMBEDDING

COMPOUND COMPATIBILITY

START TIME 0.0 END TIME 0.0

0.0

-181- THE hELATION OFCOPUTER P^RSING TC JUDGED GRAMATICALNESSOF STUDENT SE1TENCES

...... -. _ Human Judgment

Row Machine Sums Parsing "Grammatical" "Not Grammatical"

-4

29 Parsed 24 5

I

Not 16 21 Parsed 5

,

, .

Column 21 50 Sums 29

... , courtesy of S. Kuno, NOTE:All machine parsing wasdone through the Harvard University.

Chi square = 15.04(1)4(.001)*

Contingency coefficient =.48

*Data were not independent. See discussionin text. chi square of 15.04would be significantbeyond the .001 level of confidence. And the relatedcontingency coeffi- cient would be a healthy.48. In other words,the ability of the program to parse asentence would have somepredic- tive power for whetherthe sentence wouldbe judged gram- matical by an experthuman. Because of the casual way these sentences weredrawn for computeranalysis, the assumption of independenceis not warranted; butthe general value in the trend of the resultsstill suggests actuarial use of suchalgorithms for computeranalysis of essays.

The data from thecomparison are presentedin a different way in TableIX-2. Here we are able toreview Ideally, of course, the computer analysisof the sentences. and that one every sentenceshould produce only one parse, should be the same asthat of an experthuman. Nevertheless, grammatical it'is important thatthose sentences which were had, on the average, manymore completedparsings than those which were notgrammatical. .And itis interesting that the mediannumber of parsings forgrammatical sen- tences was 3, but0 for the ungrammaticalones. the It is alsointeresting to observe,in T'able IX-2, Only 16 order in which thecorrect parsingsoccurred. Seven of parsings were judged asintuitively faultless. these occurred onthe 1st trial, 6 onthe 2nd, and the outstanding as others as shown. The present Kuno program, it is, has made noprovision forstatistical optimization, in some appro- and this performanceshould be improvable priate adaptation.

In order tohave a similar parserfor experimental version of the purposes, wehave undertaken tomake a PL/I Project by Gerald predictive parser,programmed for the Appendix D also hasthe Fisher, and listedin Appendix D. to flowchart of that parser,which may helpthe reader new This. program, such strategies tounderstand their nature. -183- TABLE IX-2

MACHINE PARSING PERFORMANCE OF GRAMMATICAL AND UNGRAMMATICAL STUDENT SENTENCE3

Human Judgment

"Grammatical" "Not Grammatical"

N Sentences 29 21

Mean of Machine Parsings 23.68 8.14

Median of Machine Parsings 3

N Parsings Judged Correct 16

Order of 7 on 1st "Correct" 6 on 2nd Machine 1 on 3rd Parsings 1 on 38th 1 on 52nd

NOTE: All machine parsing was done through thecourtesy of S. Kuno, Harvard University. called PARSE, has been debugged andtested with artificial information, but not yet with naturallanguage. Such a parsing program is only the vehicle;the content must be furnished by (1) a suitable dictionary;and (2) a suitable set of grammatical rules. This brief chapter is not the place to set down all theconsiderations which will play a role in anyfurther development of suchlinguistic pro- cessors, but a fewpoints will be suggested by the next sections.

EL.n_irsearz_ialsis. Further pondering of the sentence parsing will reveal some difficultiesnot considered by the multiple path analyzer. If one is concerned about"meaning" and about how a machine"reads" a sentence, then one must arrange for the proseof an essay to hang together,in some sort of cognitive net. The token sentence ofFigure IX-2 will illustrate this problem. No provision is madefor the analysis of antecedents orreferents: the pronoun it is not tied in any mechanical wayto the word mney which it presumably renames. But pronouns are notthe only offenders in such a simplifiedanalysis. In most prose, such as scientific writing, alarge proportion of the nouns refer in some abbreviated way to persons,objects, or ideas which have already been treatedin the writing. The human reader at once connects these newexpressions with those which have gone before, but howthis is accomplishedis not yet understood very clearly.

J. Olney and D. Londe,of the System Development Corporation, are among the veryfew who have given compu- tational attention to thisproblem, and their briefwrit- ings are not yet ready for anybroad dissemination(personal communications). There are clearly someexplicit cues which may be helpful (such asnumber, gender, person). There are synonym relationshipsalso, some of which maybe discovered through mechanical useof a large dictionary.

-185- There are also questions of proximity; otherthings being equal, one would expect the most recentcandidate for refer- ent to be operative. Standard techniques of optimization may weight suchcriteria appropriately and may make a best- guess selection ofreference for pronouns or other anaphoric expressions. A great deal of work is necessary,then, in this field of discourse analysis.

Transformational grammar. Of course, one of the most active areas for current linguistic researchis in trans- formational grammar. Treatments of this topic may befound in a number of references (e.g., Hays,1967, ch. 8). Per- haps the best recent treatment of thetopic, especially from the viewpoint of computation,is by Keyser and Petrick (in press), both of whom have served asconsultants for Project Essay. Perhaps the most useful program fortrans- formational analysis is that describedin Petrick's thesis (1965).

John Moyne and David Loveman," atthe IBM Boston Pro- gramming Center, have programmed a verylimited system which carries analysis through asyntactic analysis to a transformational analysis, and prints outappropriate answers to questions. Like all such extant systems,this one is for a special purpose,in this case document re- trieval from a large library. And they have processed a few student sentences, from ProjectEssay, on an experi- mental basis, through their first,surface-structure parser.

Semantics. In general, transformationalgrammars are far from any linguistic perfection,and face deep problems which will not be described here. Yet there are approaches to the question of meaningwhich have some demonstrable usefulness and power, and which maysidestep these deepest problems for the purposes of application. Some of these are generallydescribed as involved with"semantics," and -186- some are framed within the practical problem ofquestion- answering systems. Still others are spoken of in terms of information storage and retrieval, especially what is spoken of as "fact retrieval," as contrasted with "document retrieval". These works share a common concern with the way that information may be read into some data representation in the computer, and how it may then be made accessible for further use. William A. Woods (1967), for example, took for granted the output from some syntactic and transforma- tional parsing system, and then asked how he could develop a question-answering system. His particular corpus was flight information from the Airlines Guide, and he worked out operators for logical comparison and other semantic concerns which would implement such a system. In doing so, he built upon earlier work with BASEBALL (see Feigenbaum and Feldman, 1963, Sec. 5), and similar systems, but went beyond his predecessors in certain important ways. Other new work is that of Quillian (1966), who hasprovided a way of storing semantic relationships. His structures permit comparison between two statements, and make possible judg- ments about them concerning their agreement,disagreement, or irrelevance.

The importance of symbolic logic in such systemsis apparent in the recent work by Levien and Maron(1967). These authors use the predicate calculus,with binaryrela- tions only, as a universal tool for fact storage. They organize a data base which has four different ways ofrandom access (corresponding to sentencenumber, relation name, and the two elements) for rapid retrieval of the factthrough any of its components. 'Their methodis wasteful of storage space, butextremefy rapid in operation, able to locate any fact without poring through lists. Their system thus enjoys some important virtues of thepsychological models.

-187- There is much work going on,then, in fields with important relevance for the future of essayanalysis. It takes the form of progressin linguistics, psychology,and computer science, andelements of statisticsand logic have a bearing as well. Surely, Project Essay mustmain- tain its close contacts withthese fields in relationto its future work. 3.. Future Work in Essw, Anal sis

Need for Flexibility. The sub-discipline of computer analysis is only now beginning to take shape. In the mean- time, as we have seen, work in the area seemsto call for a rather unusual approach: interdisciplinary, broad in purpose, and flexible.

In its present development, thecomputer analysis of essays does not yet lenditself to the clear, Fisherian, "classical" experimental designs, because notall opera- tions can be foreseen. It does, however, permit clear pro- cedures of dynamic development andexploration at each stage of the study, andverification of accomplishment at the end. Properly understood, these characteristics are not handicaps, but slimptoms oflarge research scale. In a recent paper, Baker (1965) pointed outthat the larger and more exploratoryresearch project "must be inherentlydynamic and possess the ability to changeits internal structure without sacrificing the rigor of thedesign" (p. 15). And another writer (Doyle, 1965) hasrecently stated that as a study approaches the "basic researchend of the spectrum, it becomes more and more imperative tobe free to alter the plan. Indeed, in basic research alteringthe plan ought to be a state of mind." With the present study, itwould be mistaken and even misleading tocommit the investigation prematurely to too narrow a path.

The first phases of this studyillustrate this point. In the earlier work, only the mostgeneral goal then, as now, was completelyoperational, foreseeable, andattainable: the maximization of the correlationbetween computer-analyzed prose characteristicsand the human judgmentsof the prose. The earlier work has reached-thisgoal,(so far as possible during the time permitted), but manypaths were altered along

-189- the way. Programming plans were modifiedand improved. Certain hypotheses were reformulated,and others discarded. At the conclusion of thefirst phases, the progresshas been much greater than if theinevitable misconceptionsof the beginning had been adheredto in spite ofeverything. The ultimate goal, however, wasrigorously adhered to, and the most careful investigatorytechniques employed at each decision point along the way. In the newer researchdesigns, what must be done, rather thanto make all of thedecisions before the choice points arereached, is to illustratethe quality of decision-making. This portion of the proposal, and that which follows, areintended to state the general objectives and thedecision-making strategy bywhich these goals will be attained.

As noted before,.thework reported here hasalready identified useful computer-analyzableindicators of student writing skill, and has demonstratedthe potential feasi- bility of overall themeevaluation .by computers. When holistic grades are desired, orratings of important essay traits, the PEG computer programalready assigns marks as accurately (measu'-d againstthe criterion ofmultiple ex- pert judgments) as theindividual, trainedEnglish teacher. Future work shouldexpand the work to theanalysis and eval- uation of content, anddeepen it linguisticallyand psycho- logically by investigation of morehumanoid processes. Some general futureobjectives may beoutlined: well 1. To expand considerationto essay content as as style. strategies 2. To explore therelation of dictionary to successful analysis,and to develop optimumstrategies for the Random HouseDictionary tape. 3. To analyze computer-generateddata in relation to subjective measures of contentand style in theearly secondary years, to increaseusefulness of analysis. correction 4. To improve theprogramming of on-line teacher. of essays, andon-line feedback tothe student or for deeperexplora- 5. To identifyfuture strategies tion of this newfield of educationaltechnology. opened up the Grading of content. Just as we have traits of an essayin possibility ofgrading the esthetic possibility English, so weshould also beinterested in the of essay material,apart of judging thesubstantive content This is from the generalwriting ability ofthe student. within this a dimensionof essay analysisnot yet attempted project, yet it maybe approached at anumber of different of these might prove levels ofsophistication, and some sample both economicaland rewarding. Let us consider a various problem in Americanhistory, toconceptualize these finally in morehypothetical levels, firstheuristically, and but technicaldetail. factual Suppose we wishedto gradechildren on the It content of an essayabout the discoveryof America. phrases should appear might be supposedthat certain words or Christopher, Ferdi- in the morecomplete essays: Columbus, 1492, Nina, nand, Isabella,king, queen,Spain, Azores, These words andothers Pinta, SantaMaria, Indians etc. much as has could be fed into core asa kindof dictionary, prepostions, misspel- been done alreadywith such lists as of any of these lings, commonwords, etc. Each first use in some fashion. No Columbus expressionscould be scored correlated with doubt such scoreswould be positively assigned by humanjudges. "factual completeness"ratings as in achievingthe Such scoringwould thereforebe an aid I.A of FigureII-1. simulation soughtfor in Quadrant relationships among Suppose we askedfor meaningful of such arelationship these and otherwords. One evidence in the samesentence might be to have theword Isabella occur And such usewithin the same as thephrase queen ofSpain. -191- sentence should perhapsreceive a higher scorethan use is actuarial, in different sentences. Again, the consideration closer to yet now the statisticalanalysis is one small step a meaningfulrelationship between ideas. Of course, at asomewhat higher level, wewoilld not wish too high a premiumplaced on arbitrarywords, so we might inClude monarch orsovereign in core storage as acceptabl equivalents of queen, orIsabel as an acceptable form of Isabella. Synonyms could possiblybe scored quite sensitively, according totheir judged "semanticdistance" from the most desiredwords.

Or we could lookfurther for meaning,by asking that, within the sentence,Isabella and queen(or equivalents) Some common be in some standardform suggesting identity. ways thismight be dohe are as atitle (Queen Isabella), as anappositive (Isabella, queen or queen . . . . Isabella), or as apredicate nominative(Isabella . . . And such evidence [form of to be] . . . queen, orinverse). of identify could bescored somewhat morehighly than the appearance togetherwithout such evidence. Note that Now consider amuch more advancedsystem. if we have asufficiently sophisticatedgeneral dictionary available, and an adequategeneral sentenceanalyzer, we will not need toanticipate each specificequivalent ex- pression or relationshipin each specific essayexamination, read in a key inthe in order to scoreit. We can instead narrative form of English sentencescontaining some model would about Isabella andColumbus. Various equivalences grading of the stu- then be potentiallyavailable for the clearly in the I.B dent's "own words." But here we are Quadrant of Figure II-1,and are doing akind of "master analysis." -192- The progress here is from theemployment of a simple lexicon of key words, to the acceptanceof their synonyms, to the search for thekey words or synonymsin appropriate contexts, to the searchfor these meanings inappropriate relationshiEs.

For any ultivate,applied evaluation of essaycontent, a computer programshould be no moreelaborate than neces- sary for theoverall goal. If it is much cheaper to use the lower, lexicalstrategy, and if it isalmost as accurate, then it would only wastemachine time to computehigher- level informationwhich will not be used. On the other hand, in some essaysthe special vocabulary maynot be very important, and other factors maycontrol the evaluationof merit. The Columbus example seemsto depend highly on vocabulary, but there, maybe others in whichall students use essentiallythe same specialvocabulary, and the dis- criminations are at thehigher contextual andrelational levels. One early discoveryneeded, then, is thedegree to which And some most school essayevaluation is dictionary-loaded. workers are addressingthemselves to this need,with some college levelexaminations, at the time of writing. Another purpose is toseek more advancedstrategies of semantic analysis, ofthe contextual orrelational sort. These strategies have someantecedents as well. Most tech- based upon niques of informativeretrieval, for example,,are co-occurrences (cf. pp.310-353 in Garvin,1963). And the usual employment of theGeneral Inquirer systememploys As we havesaid, such contextualtechniques (Stone,1966). still more advanced systemsof relationalsemantic analysis have been programmedby such workers asWoods (1967), or An impres- John Moyne (of IBM'sBoston ProgrammingCenter). sive attack on the problemof artificial memoryappropriate to such relationships hasbeen made by M. RossQuillian (1966). These workers have alreadyconsulted informally .-193- available forfurther work, with this project,and will be pilot tasks. and some havealready done some should not Further analysisof style. Surely workers quality inwriting, but abandon theanalysis of esthetic strategies ofmeaning to rather useavailable advanced extent, theanalysis of further suchjudgment. To some meaning, and itis hoped that style must bepaired with in thedescription of the next yearswill see advances preliminaryconsidera- surface structure,and at least some such as synonym,contrast, and tion ofstylistic traits large semanticcontent. parallelism, allof which have a have described, somehigh On a tentativebasis, as we by parsing programs, school essayshave beenalready analyzed 7094 by SusumuKuno (1964)at onewritten in FAPfor the IBM David Lovemanand Harvard, andthe other inPL/1 "" by These have John Moyne atthe BostonProgramming Center. is alreadyavailable, but indicated thatpartial parsing will be nedessary. To that furtheradaptation of any parser will be used toinform some extent,the outputof a parser During the nektyears'of such work the semanticanalysis. further than atpresent. parsing will becarried much Some reasonable Hothe..../..ete_essaana3i.zer. have beenstated above. These future objectivesof workers and canprobably be arerealizable anduseful objectives, of time andeffort. obtained withinreasonable limits construft a moredistant Nevertheless, itis informativeto of computerroutines tied objective, whichwould be a set humanoid essayanalyzer. together in a morecomplete and currentlysummarized., Anticipated futurestrategies are is basedpartly on workalready in FigureIX-6. This figure suggested minoradaptations of accomplished,partly on others, andpartly onprojected systems alreadyworking for operative in anysystem, but.which programswhich are not yet at theefficiency desired. .do not seemimpossibly difficult

-194- FIGURE IX-6

HYPCTHETICAL COMPLETEESSAY ANALYZER

Handwritten or typewritten orother raw response 1. INPUT and PUNCH. of the writer isconverted for computerinput.

Creates arrays of wordsand sentences as foundin prose. 2. SNTORG. called This is just asperformed in PEG, orby a PL/I version SCORTXT. syntactic roles to eachword. This DICT.Assignment of available 30 but needs an expandeddiction- is currently doneby many programs, semantic ary, and ambiguityresolver. At the same time, the reference of information will be storedin the work-space for House Dictionary other parts of program.The tape-written Randam (Unabridged) is a very valuablefacility for this work. most prom- A modified Kuno(1964) program such as PARSE seams 4. PARS. Alterations ising, and the skeletonis now availablein PL/I. substriags, and to work will be necessary toaccept well-formed out dictionaries andgrammars ofappropriate power. most likely This is intended toidentify and encode the 5. REFER. This pro- referents of pronounsand other anaphoricexpressions. features and probablysemantic cess mustemploy both syntactic information from DICT orother sources.

string output of(5), KERNEL KERNEL and STRUC. Fran the rewritten 6. propositions, and STRUCwould would extablish a setof elementary This step would encode the relationshipsamong theseelements. simplest possibleunits, retain the informationof an essay in about emphasis,subordi- yet would retainadditional information nation, causal relation,etc., among theseunits. the semantic The elementary unitewould be augmented.by 7. EQUIV. be assigned acluster of information in DICT. To each word would distance.This permissible synonyms, withweightings of semantic emphasis in the essay,and permits an analysisof redundance and of the student essaywith permits a comparisonof the content that of the key ormaster essay.

surface atructurecharacteristics of the 8. STYLE. Descriptions of the organization of themes,types and varie- essay: parts of speech, depths, tightnessof refer- ties of sentencestructure, grammatical grammatical errors andstrength*. ence, etc.: information about and master essay, Comparison of theagreement of' student 9. CONTNT. hits, these weighted through measure ofkernel hits and.struc by semantic distanceof language chosen. 7 of appropriateprofile for the 10. SCOR.Miativariate prediction immediate purpose.

-195- The limitations of spacewill permit only a fewcomments on this figure. For large grading systems, overestablished substantive content, it wouldbe possible, for the key or master essay, to editby hand the output fromcertain rou- tines (especially REFERand STRUC). Of course, four of the most important routineslisted in Figure IX-6 arefar from perfected in any existing programs. Ideally, they would assume bettersolutions to certain major,stubborn problems in computationallinguistics. Indeed, certain stepsin.this hypothetical essaygrader are close tothe heart of some of the mostpersistent and that troublesome problems inlinguistics. . Is it necessary sentences be syntacticallyanalyzed before mappinginto deep structure? What is the proper roleof semantics'in such deep structure? How can the outideknowledge of the reader be incorporated intothe machin analysis? In gen- eral, how may weincorforate some ofth intuitive richness huMan brings to his rading? which the literate A Surely, in essay analysisworkerswill not suddenly re- . solve all such questions. These questions so--t uble linguists is to contribute tothe recent officialpet imism, %ans- in theUnit16d States, about the future ofmechanical mechanical translati n. lation. After 15 years of effort, is still regarded asdisappointing in quality,and virtually no sustained outputof any machine programwould be ordin- arily mistaken for thework of a professionalhuman trans- lator.

On the other hand,the eailiest attemptsat essay grad- ing by computer have, in a verylimited way, leapedahead human ratings- of of machine translation. And if the expert high school essays maybe regarded as anacceptable goal, such a goal then the machine program appearsto have reached superior already. For that matter,improved performance, even be imme- to that of theindividual human expert,appears to diatelypracticable as well. -196- The explanation of this advantage, ofcourse,is that the problem of essay grading as attackedin the current work is much easier than the problem ofmachine translation. In translation, every nuance of the inputstring should be accounted for in the output string. In essay grading,only a certain portionof the input text needs to beaccounted for, and the output does notdepend on the existence of any large language-generating system. High quality machine translation apparently demands afair portion of the total language-manipulating capability of thehuman, but essay grading may use only a fraction ofit, and may process language in ways quite differentfrom that of the humanbeing. For example, our present programshave to date largelyignored order and !21.12E2e in the essays,although to the human the order of words is, of course, ofcrucial and unceasing im- portance. a Since essay grading can workwith such fractional information, then, why pursue thedeeper analysis of Figure IX-6?Clearly, the purpose is notentirely the same as it would be for the usuallinguist. At any discrete time in research, what is sought is notnecessarily the perfect humanoid behavior, but rather thoseportions of that be- havior which, given any currentstate of the art, will con- tribute optimally to efficientand practicable improvements in output. Indeed, regardless of theeventual perfection of deep linguistic behavior,for any specificapplication to essay analysis, atany one moment, large portionsof such available behavior may be irrelevant,just as it seems that ordinary human languageprocessing does not usuallycall for our full linguistic effort.

Yet we regard it aseventually important to beable-to perform these various kindsof advanced machineanalysis when,required. Therefore, the eventual usesof the ideal essay analyzer may requireanalytic-capability as deep ai may be imagined. Writing out suitable comments fot the -197- student, for example, will in some casestax any system which may be foreseen.

Even approximate solutionsto these problems,however, though unsatisfactory forcertain scientific purposes, could make importantcontributions to the educationaldes- cription and evaluation of essays. For such evaluationis itself probabilistic,limited by imperfect asymptotesof writer consistency and rateragreement. And such evalua- tion therefore does notrequire, to be practicableand satisfactory, a deterministicperfection. There is a funda- mental difference in goalswhich must be realized. As has been demonstrated here,the output from muchcruder statis- tical programs has alreadyreached a quality not tooremote from usefulness. The more advancedstrategies currently seem, at least tothe.present workers,bright with promise, for an ultimate targetof such analysis,subject to altera- tion and amendment as moreis learned about the natureof essays and about theevaluative process.

In conclusion, thissection on the futurehas aimed, first, at explaining thespecial nature ofobjectives in a new, exploratory,and developmentalresearch; second, at briefly listing conciseand obtainableobjectives; third, evaluation of sub- at explainingappropriate goals in the ject-matter content andin the appropriate useof dictionaries; fourth, atexplaining the relationbetween objectives of stylisticanalysis and objectivesof subject- matter; and fifth, atsetting forth ultimateobjectives in which, while it will never a humanoid,hypothetical analyzer be completely realized,will be a target forthe accomplish- analysis. ment of theimMediate future. Surely, the computer of language willbecoMe a permanentfeature of the educa- tional scene.

-198- APPENDIX A 0-445 0EG I FORTRAN SOURCE LIST 09/26/66 PAGE 1 ISN SOURCE STATEMENT 0 $1BFTC PROJECT ESSAY

C THIS IS THE COMPLETE SOURCE PROGRAM FOR ESSAY ANALYSIS, 1966-67, C SUPPORTED BY' THE COLLEGE ENtRAN-CE EiAMINNYION 8OAR-5: C AND MORE BY THE UNITED STATFS OFFICE OF EDUCATION C ANO CARRIED OUT AT THE BUREAU' OF EDUCATIONAL RESEARCH' AT THEUNIVEkSITY C OF CONNECTICUT, IN STORRS, WHERE ELLIS B. PAGE WAS DIRECTOR OF THE C PROJECT, AND MR. AND MRS. GERALD FISHER WERE PRINCIPAL PROGRAMMERS.

IN THE EARLY WORK, HIGH SCH6OL ESSAYS' WEFE"ANALYZED BY C COMPUTER fOR 30 VARIABLES. THESE VARIABLES ARE C TRANSk)RAE6 BY TH-E SAME PRO61zAM-TO APPROP-Rfkff-SC4CES- (USUALLY' C RATIOS). THEN MULTIPLE REGRESSION MAY BE PERFORMED TO PREDICT THE c POOLED HUMAN JUDGMENTS OF THESE ESSAYS. THE PRESENT PROGRAM C DOES THE CENTRAL TASKS OF SENTENCE ORGANIZATION AND WORD LOOKUP C THAT ARE IMPORTANT IN ALMOST -A-IW NAftik-AL-CA-NOAGE -ANALYSIS.

A6APTIONS OF THIS PROGRAM MAi-BE MADE RATHER EA-S107. C INQUIRIESMAYBE ADDRESSED TO DR. PAGE AT THE BUREAU OF EDUCATUNAL C RESEARCH, UNIVERSITY OF CONNECTICUT.

THIS IS TA-E.-MAIN PROG.RAM, CONTAINING C THE GROSS_LOGIC FOR PROCESSING THE ESSAYS. C THE BEGINNING STATEMENTS OF EACH PROGRAM SPECIFYtHE INTER-RELATION- C SHIPS AMONG THE PARTS OF STORAGE AND WORD TYPES AND WORD NAMES. C THESE STATEMENTS ALLOW FLEXIBILITY IN WORD HANDLING, COMPRESSION C IN PROGRAMMING, ANO THE NECESSARY COMMUNICATION LINKS 8ETWEEN C THE INDIVIDUAL SUBPROGRAMS. C THE WORDS, THEIR CONTENTS, THEIR ALTERNATIVE NAMES AND THEIR C LOCATIONS IN STORAGE ARE DEFINED ON AN ACCOMPANYING ALPHABETIZED _ C LIST. C BESibES THE TABLES OF CHECKLIST WORDS: INPUT CONSISTS OF ESSAYS C WHICH.ARE_PUNCHED ONE LINE PER CARD, USING UP TO 80 COLUMNS OF THE C C THE FIRST CARD OF EACH ESSAY IS PRECEDED BY AN IDENTIFICATMCARD C WHICH CONTAINS THE IDENTIFICATION NUMBER OF THIS ESSAY IN COLUMNS1-5 C A.No THE TITLE INDICATOR (WHICH IS BLANK IF NO TITLE IS PRESENT). C FOLLOWING THE LAST CARD OF EACH ESSAY IS THE END CARD WHICH C CONTAINS_AN.ASTERISK_(*) _IN cm.imrsi 1, AND,kBLANK IN COOMN 2. C FOLLOWING THE LAST END CARD IS THE END OF JOB CARD WHICH C CONTAINS 99999 IN COLUMNS 1-5:

_ THE OUTPUT CONSISTS OF PRINTED LINES: ONE FOR EACH SENTENCE, AND C AN ADDITIONAL ONE FOR EACH ESSAY CONTAINING IN ARRAY C ORDER THE CONTENTS OF THE SUMS ARRAY AND THE TOT ARRAY. C THESE CONTENTS ARE DESCRIBED 'ON THE ALPHABETIZED LIST. C FOLLOWING IS THE SET OF TRANSFORMATIONS,'FROM WHICH

C THE REGRESSION ANALYSIS MAY BE RUN. , C THE SUMMARY ESSAY DATA ARE ALSO PUNCHED IN CARDS FORIMMEDIATE USE.- WE WRITE THE PRINTED INFORMATION ON'TAPE-UNIT 0

-7206 APPENDIX A (Continued)

WE WRITE YHE TOTAL INFORMATION ON TAPEUNIT 4

READ THE WORD LISTS AGAINST WHICH PARTS OF EACHESSAY WILL BE CHECKED

NAME COMMON TYPE ARRAY NAME CONTENTS A CHAR REAL ALPHA% 1< ABBBBB APOSTR CHAR REAL PUNCT% 6< BBBBB tHAR REAL ALPHA% 2< BBB8BB BLANK CHAR REAL PUNCT% BBBBBB BROKUP IN REAL 'BRO-KUP 80 WORD CARD IMAGE 1 COLUMNPER CHAR REAL ALPHA% 3< CB8BBB CONNEC LISTS UOUBLE ROTBLT,162< 30 CONNECTIVES COMMA CHAR REAL PUNCT% 4< IBBBBB COLON- CHAR --REAL PUNC1111< ..138-ffff yAREN CHAR REAL PUNCT% 8<

-200-* APPENDIX A (Continued)

ESSAY NUMSEN OUT INTEGER TOT% 3< NO OF SENTENCES THIS NUMPAR OUT INTEGER TOT% 4< NO OF PARA THIS ESSAY INTEGER SUMS% 8< NO OF WORDS THISSENTENCE .NUMWDS OUT SENTE OUT INIEGER Wd oF-IfiLitiin WORDS THIS NUNDER SENTECE NWDSQ OUT INTEGER SUMS% 9< SQ OF NO OF WDS THIS 0 CHAR REAL OPAREN CHAR REAL PUNCTX 7< %BIWA CHAR REAL ALPHA%16< PBBBBB PARNUM OUT INTEGER SUMS% 4< SEQ NO OF THIS PARAGRAPH PERIOD CHAR REAL PUNCT%10< .BBBBB PREP LISTS DOUBLE ROTBLX42< 50 PREPOSITIONS CHAR REAL ALPHAX17< QBBBBB QUEST CHAR REAL PUNCT%14< .QBBBB CHAR REAL ALPHA%18< RBBBBB TABLES PDTBL LISTS REAL ROM 10540 WORDS CONATIN WORD RELPRO LISTS DOUBLE RDTBLX142< 10 RELATIVE PRONOUNS CHAR REAL ALPHA%19< SBBBBB OUT INTEGER SUMS% 3< SEQ NO THIS-SERTtNCE SENNUM NOT SENTYP OUT INTEGER SUMSX29< 1 IF DECLAR A 0 IF NOT SENTYP OUT INTEGER SUMS%30< 1 IF DECLAR 8/ 0 IF OIF NOT SENTYP OUT INTEGER SUMS%31.< 1 IF EXCLAM, SENTYP OUT INTEGER SUMSX32< 1 IF QUESTION/ 0 IF NOT SEMIC CHAR REAL PUNCT%12< ./BBBB SLASH CHAR REAL PUNCTX 9< /BBBBB LISTS DOUBLE ROTBL%6222< 2000 COMMONLYMISPELLED WORDS SPELLX WORD THIS SSQLET OUT INTEGER SUMS% 7< SUM OF SQ OF LETTERS BY STAR CHAR REAL PUNCTX 2< *BBBBB ouT INTEGER SUMS% 5 1 FOR S-V TYPE OPEN 0FOR NO SUBVER SENTENCE SUMLET OUT INTEGER SUMS% 6< SUM OF THE LETTERS THIS TEST SUBCON LISTS DOUBLE- RDTBL%2< 20 WOS FOR SUB CONJ SVOPEN OUT INTEGER TOT% 5< NO OF SENT OPENING S-V FOR -V OPEN TEST SVOPN LISTS DOUBLE 'WU% 024 < 1 0 WORDS REAL ALPHA%20< TBBBBB CHAR THIS ESSAY TAPOS OUT INTEGER TOTX11< NO OF APOSTROPHES ESSAY TCOMMA OUT INTEGER TOT%12< NO OF COMMAS THIS OUT INTEGER TOT%17< NO OF COLONS THIS ESSAY TCOLON ESSAY TCONN OUT INTEGER 101%23< NO OF CONNECTIVES THIS NO OF DASHES-THIS ESSAY C TDASH OUT INTEGER TOT%16< OUT INTEGER TOTX27< NO OF DALE WORDSTHIS ESSAY TDALE PTS THIS ESSAY OUT INTEGER 101%20< NO OF EXCLAMATION TEXCLA END PNCT THIS TENDPT OUT INTEGER 101%28< NO OF SENT WITH NO ASSEMBLED WORDS OF THISSENTENCE- TEXT IN DOUBLE HLFTXT OUT INTEGER TOT% 1< IDENT NO THIS ESSAY IF NO TITLE OUT INTEGER SUMS% 2< U IF YES TITLE/0 TITLE ESSAY OUT INTEGER TOT% 6< SUM OF LETTERS THIS TOTLET ESSAY TOTWDS OUT INTEGER TOT% 8< NO OF WORDS THIS OUT INTEGER 101%10< NO OF PARENTHESES TPAREN ESSAY TPER OUT INTEGER T0T%13< NO OF PERIO-6S THIS INTEGER 101IS14< NO OF PERCENT SIGNSTHIS ESSAY TPERCT OUOUT THIS ESSAY OUT INTEGER T0T%22< NO OF PREPOSITIONS TPREP ESSAY TQUOTE OUT INTEGER TOTX19< NO OF QUOTES THIS INTEGER TOT%21< NO OF .QUESTION MARKSTHIS ESSAY TQUES OUT THIS ESSA INTEGER TOMS< NO OF*RELATIVE PRONOUNS TRELPR OUT IN-tAtk WORD OUT INTEGER -.TOT%-.-7<- SUM OF SQ. LETTERS TSQLET THIS ESSAY TSEMIC OUT INTEGER TOMB< NO OF SEMICOLONS NO OF SPELLING ERRORSTHIS ESSAy TSPELL OUT INTEGER . T0Tt24<

-201--

_ APPENDIX A (Continued)

TSCONJ OUT INTEGER TOT%26< NO OF SUBORDINATING CONJ THIS ESS TTITLE OUT INTEGER TOT% 2< TITLE THIS ESSAY TTYPE OUT INTEGER TOT%29< NO OF DECLARATIVE TYPE A SENTENCE tfiliff- OUT INTEGER TOT%30< NO OF DECLARATIVE B S4NTENCES TTYPF OUT INTEGER TOT%31< NO OF EXCLAMATORY SENTENCES TTi-OE OUT INTEGER -TOT%32< NO OF QUESTIONS TUNDER OUT INTEGER TOT%15 NO OF WORDS ITALICIZED THIS ESSAY TWOSQ OUT INTEGER TOT% 9< SUM OFP) OF WORDS IN EACH SENT CHAR RFAL ALPHA%21< UBBBBB CHAR RFAL ALPHA%22< VBBBBB CHAR REAL ALPHA%23< WBBBBB x -CHAR REAL ALPH-A%24< XgBBB-B Y CHAR REAL ALPHA%25< YBBBBB CHAR REAL --A-1511-A-2-6<-115131313B--L % COMMON IS AN AREA SHARED BY THIS PROGRAM AND ITS SUBPROGRAMS.

COMMON/IN/BROKUP,TEXT,LENGTH PEG I 811-01CO-V s'-ANit-3-u--criAR-4-crEll C-4-k-0-114A-G-E-;--- REAL BROKUP%80< PEG I TEXT IS THE ASSEMBLED WORDS OF THE SENTENCE. DOUBLE PRECISION TEXT%100< ter.-N-Dff 51-1AkES 'THESE L srs WITH --0-TWEA----PRITGWA-M-S---1-1-41/N--6- COMMON/LISTS IN THEIR TEXT. -CdhiMUNTLIgTS/SWiC04,PREP,RELPRO,CONNECOALE,SPELLX,DECLAB,W00"61 PE-G I TYPING ALL WORDS IN THESE ARRAYS AS DOUBLE PRECISION, I.E. THEY CAN HAVE FROM 1-12 CHARACTERS EACH. DOUBLE PRECISION SU8CON%2O

I 5,TDASH,TCOLONtTSEMIC,TQUOTEtTEXCLAITQUEStTPREP/TCONNtTSPELLITRELPRPEGPEG I 611SCONJ,TDALEtTENDPT,TTYPE%4< CUMULATIYE ASPECTS THE EQUIVALENCESTATEMENT IS USEDSO THAT THE CiN BE INCOROORATEd. OF THE RRAV SUMS I fQUIVAL_ENCE_%SUIS%1<,40

%RDTBL%1<,SUBCON<,%RDTBL%41

AN ARRAY%RELN< AND A VARIABLE SHAREDBY OTHER PROGRAMS HAVINGTHE COMMON/PSUM/ STATEMENT. COMMON/PSUM/ RELN,NEXT TYPIqG THE ARRAY RELN AND THE VARIABLE NEXT AS INTEGERS. INTEGER RELN%20<,NEXT PELN %I< CONTAINS THESUMSSUB-tCRIPT CORRESPONDING TO PUNCT%I< AN ARRAY OF 10 WORDS SHARED BY OTHERPROGRAMS HAVING THE COMMON/LIST2/ STATEMENT. COMMON/LIST2/ SWORD TYPING THE ARRAY SWORD AS DOUBLE PRECISION, I.E. EACHWORD CAN HAVE FROM 1-12 CHARACTERS. DOUBLE PAECISION SWORD%10< TYPING THE ARRAY SWRD AS REAL. REAL SwRD%20< EQUIVALENCE %SwORO,SW110< A vAgIABLt SHARE0 BY ALLPROtRAMS HAVING THE COMMON/COUNT/ STATEMENT. COMMON/COUNT/ ICTR TYPING THE VARIABLE ICTR AS AN INTEGER. INTEGER ICTR A LOGICAL VARIABLE SET TRUE OR FALSEDEPENDING UPON.WORD BEING ANALiZED IN THE SENTENCE. LOGICAL XX A LOGICAL VARIABLE SET TRUE OR FALSEDEPEOING UPO-N WHETHER IT IS THE START OF A NEW SENTENCE OR NOT. LOGICAL START INITIALIZING THE IMAGE COUNTER. ICTR#1 SEE MEMO FOR EXPLANATION%UNIVERSITYCOMPUTER SYSTEM 7040-3< CALL -FPTRAP READ IN CARDS CONSISTING OF PUNCTUATIONMARKS AND LETTERS OF THE ALPHABET ACCORDING TO FORMAT STATEMENT899 WHICH SPECIFIES ARRANGEMENT. READ%-l899< OUNCT,ALPHA READ IN CARDS CONTAINING WORDS IN WORDTABLES. THE SPLIT IN THE ARRAY- Ob-Tiit: IS A RESULT OFHAViNG LESS THAN 2000 MISSPELLED WORDS IN THE LIST. READ%5,900< %RDTBLUI

C READ THE FIRST CARD OF THE NEXTESSAY. IT CONTAINS THE.ID NUMBER AND INDICATION OF WHETHER OR NOT THEREIS A TITLE TO THIS ESSAY

READ IN FIRST CARD OFESSAY-CONTAINING IDENTIFICATION NUMBER AND

-204- APPENDIX A (Continued)

TITLE%IF ANY.< 2 READ%5,901< IDENTIX AssIqp LOGICALCONSTANT FALSE TOESSEND%ESSAY END.< ESSENDN.FALSE. IF A 999 DETERMINE WHETHERALL THE ESSAYS AREFINISHED. A TEST TO PROCESSED. CARD iS PRESENTTHEN ALL THEESSAYS HAVE BEEN GO TO 200 IF %IDENT.EQ.99999< ASSIGNING THENUMBER THIS STATEMENT SAYSTHERE IS A TITLE dY ONE TO TITLE. TITLEN1 IF TRUE, THIS_LOGICAL TEST THENCHECKS TO SEE IF ATITLE EXISTS. IF -FALSG ITRETAINS IT ASSth-NSfAE iskt6ETERM1NEDVALUtZER0.- THE VALUE ASSIGNEDIN THE PRECEDINGSTEP. IF%X.EQ.BLANK< TITLENO REPLACE ID WITH THEIDENTIFICATION NUMBER. IDNIDENT SENTND%SENTENCE END< ASSIGN LOGICALCONSTANT FALSE TO 'ErsiTND#.FALSE. INITIALIZE THE SEQUENCENUMBER OF THESENTENCE. SENNUMNO INITIALIZE THE PARAGRAPHNUMBER. PARNOM N I CHARACTER. READ THE 30CHARACTER CARD IMAGE,ONE COLUMN PER B-Rok00 REAL35.,902< CONTROL TO A CALLSTATEMENT TO THIS ASSIGNED GOTO TRANSFERS DETERMINE SENTENCEORGANIZATION. GO Tp_loo THIS ESSAY READ IN EACHCHARACTER OF THENEXT LINE OF C OR IF IT IS A NEWPARAGRAPH FINISHTHE C IF IT8t-6-INS. A N-t-WffAY TOTALS ANALYSES OF THEPREVIOUS SENTENCEAND ACCUMULATE C CURRENT SENTENCE OTHERWISE CONTINUEASSEMBLING THIS C IS FOUND.____ C UNTIL SENTENCEENDING PUNCTUATkON READ IN NEXT 80CHARACTER CARDIMAGE. BROKOV 1 NOT EQUAL TO ASTAR OR BLANK IF FIRST ORSECOND CHARACTER IF EIfHER RESPECTIVELY GO TO101. %INCLUSIVE OR< HAS VALUE TRUE OR IF BOTH AANEYB ARE TRUE. A OR B IS TRUE GO TO 101 IF%BROkUO%Ix.NE.STAA.OR.--BROKUO%2<-.NE.BLANk< TRUE. IF ABOVE TESTFALSE THEN SET_ESSAY END_EQUAL_TO -ESSENI-Ji:tkUE. FALSE SET SENTENCEEND EQUAL TOTRUE. C IF ABOVE TEST SENTND#.TRUE. EQUAL TO FALSE. IF NEXT IS QUALTO ZEROSET_SENTNECE END tENTWii.FALSt. SUMS. IF NEXT IS EQUALTO ZERO GO TO15 AND INITIALIZE -----Ii40-(-T;tQ".-6< do fb B- FOR DETERMINATIONOF SENTENCEORGANIZATIJN. C GO TOCALL SNTORG GO TO 100 101 DO 102 ICT11, NOT' 'BLANK do TO 'CALL' SNTORG 'IF 'ANY--eirtRE04k.sT Fteuk IMAGES FOR DETERMINATION,OFSENTENCEORGANIZATION. C. GO tb 102 iFtikokUPYrd-T.W.-8.LANk<' INCREMENT PARAGRAPHNUMBER BY ONE. PARNUMNPARNUME1 205 APPENDIX A (Continued)

SET SENTENCE ENO EQUAL TO TRUE. SENTND*.TRUE. IF %NEXT.EQ.0< SENTNDN.FALSE. THIS CALL STATEMENT CALLS THESUBROUTINE SNTORG WHICH ANALYZES EACH CHARACTER OF THE LINE OFTHE ESSAY THAT WAS JUST READ IT FINDS isidRos AND PUNCTUATIONAARKS AND ASSEMBLES THE CHARACTERS INTO SENTENCE COMPONENTS. IT ALSO MAINTAINS A COUNT ON SIGNIFICANT SENTENCE ELEMENTS. 100 CALL SNTORG IF TRUE GO TO ONE AND READ INNEXT CARD, IF FALSE CONTINUEWITH NEXT STATEMENT. IF%.-Raf.ITND(' GO TO I MM REPRESENTS THE NUMBER OF WORDS INTHE SENTENCE. MMit2*NEXr- WRITE THE SENTENCE ACCORDINGTO THE FOLLOWING FORMAT. WRITE%-6-006<- ATTLFTXTZTI<,IT41,MIM<- C SPECIFIES TEN 12 CHARACTERWORDS PER LINE. 906 gORMKTI1R=2046/*1H-ZUAlk< INCREMENT THE NUMBER OFSENTENCES BY ONE. SENNUM#SENNUM&1 THIS SUBROUTINE TYPES EACHSENTENCE ACCORDING TO ENDPUNCTUATION AO' FIRST- WORD 'POE. CALL TYPSEN THIS SUBROUTINEbETERMINES'TYPE OF SENTENCE-015-NIN6. CALL SEEOPN

CHECK EACH WORD AGAINSTTHE LISTS OF SPECIAL WORDS ANd 1HE LIST' OF COMMONLYMrSSPELLE5-W-OkDS.

ICI IS INDEXED ACCORDINGTO NE NUMBEk OF WORDSIli THE SENTEqtE. DO 10 ICT#1,NEXT TESTING EACH WORD OF THESENTENCE. IF LENGTH%ICT< EQUAL TO99 IT IS THE END OF THE SENTENCE. IFILtN-GfH%Itt<.EQ.99( GO TO 16-

SET XX T'RUE FOR FIRST ORSECOND igiORD OF THESENTENCE. THIS WILL PREVENT IT FROMBEING CHECKED AGAINST THELIST OF RELATIVE PRONOUNS,

REPLACING Xj-1--BY ONE OR TWO FOR TEST IN SECONDSTATEMENT BELOW. XX#ICT.EQ.1.0R.ICT.EQ.2 REFERS TO A SUBROUTINE WHICHDETERMINES THE TYPE OF WORD. CALL CHKLST%TEXTUCT<,XX< SEE COMMENT ABOVE WHICHSTARTS WITH - SET XX TRUEFOR FIR'ST,--ETt. IF%X.X< GO TO 10 REFERS-TO A SUBROUTINE WHICHCHEtKS FOR SPELLING. .cALL SPELXWEXTUCT 16 CONTINUE" PRINT THE RESUTLS OF THFANALYSIS OF THIS SENTENCEAND, IF TI IS C TOTAL RESULTS FOR C THE LAST SENTENCE OF THEESSAY, ALSO PRINT THE THE kiktiUt ESSAY

wRITttiAt' FOR LINE ON TAPE O. WRITEXO< SSUMSXII,II#1,34< PRINT SUMS FOR LINE. =11

APPENDIX A (Continued)

WRITE%6,903< %SUMS%II<,II#1,34< ACCUMULATE TOTALS DO Ll JCTN1,4 11 TOT%1--C.kitSUMSUCT< DO 12 ICT#5,34 12 TOttIC-f

IF TRUE GO TO SNTORG. IF FALSE, THEN WRITE TOTALSFOR ESSAY. I--*,--N-UT...tt-t-END..-AWO,ICTR.GT.80< GO ESSEND< GO TO 100 WRITING ACCUMULATED TOTALS FORAN ESSAY ON TAPE UNIT 0, WRITE%0< ZATUI

12#2 TOT%3< IS NUMBER OFSENTENCES THIS ESSAY. SENNTOT%3<.___ TOT%8< IS NUki3tRor WORDS THIS ESSAY. WDSNTOT%8< TOT%2< REPRESENTS TITLE THISESSAY. X11%101TOT%2< A RATIO SCALE FOR ONE OFTHE VARIABLES. X1IA2PSIEN<44190 TOT%4< IS NUMBER OFPARAGRAPHS THIS ESSAY. X.11%3

DO 511 II#6,14 511 X11%II

-207-

.716 ' APPENDIX A (Continued)

CONJUNCTIONS, AND DALE WORDS FOR THEESSAY. DO 513 11413,23 513 X1IAII<4%FLOAT%TOTUI&4<

REPLACE START BY FALSE SO THATTHE PROGRAM WILL NOTINTERPRET START AS THE BEGINNING OFANOTHER LINE. 16 START.4.FALSEe NO46ER;-tifLEOENct Ji0A-6-Ek-ot"

C _SENTENCE,. AND SEQUENCENUMBER.OF _PARAGRAPH. . DO 14ictioi,4 14 SUMSUCT<40 THE DO 17 LOOP INITIALIZESTHE CUMULATIVE ASPECTSOF THE "c PROGRAM FOR_EACH ESSAY. DO if fcTii,Jip 17 TOTUCT<#0 BEGIN kNAL-YiING THE ESSAYEftY GOING TO 2. GO TO 2 SET IDENTIFICATION NUMBEREQUAL TO TOT%1<. 200 TOT%101IDENT WRITE THE TOTALS FOR ESSAYON TAPE O. WRITE%0< %TOTUI<,I141,34< .0 WRITE THE TOTALS FOR THEESSAY ON tAPE 4. WRITE%4< %T0TZII<,II41,34< LAST.ENTRY MADE ON TAPE O. END FILE 0 LA§f-ffiiikii-AAot 64 TAPE 4. END FILE_ 4 REWIND TAPE 0 %END OFJOB< REWIND 0 REWIND TAPE 4 XEND OFJOB<

-208- APPENDIX A(Continued)

REWIND 4 RETURN ANDALPHA.. _ PUNCT . WORDS FOR _ . _ SPECIFIES 6 CHARACTER . _ _ 899 FORMAT%A6<- CHARACTER WORDS PERLINE FOR WORDLISTS. C SPECIFIES SIX 12 900 FORMAT212A6< SKIP 2 SPACES,_TITLE SKIP 7 SPACES,IDENTIFICATION NUMBER, 901 FORMATZI5,9X,A1< EACH. SPECIFIES 80SUCCESSIVE FIELDS OFONE CHARACTER 902 FORMAT%80Al< SPECIFIES PRINTOUT FOR SUMS ANDTOTS. 903 FORMATUHOI5/12,313,215,215,1713,14,512,213< SPECIFIES PUNCHEDCARD OUTPUT FORAN ESSAYXTOTALS< 904 FORMAT2I501,F4.0,F5.0,F4.0,12F5.0< SPECIFIES PRINT OUTFOR-A4 ESSAIMOTALS< 965 FORMAT%1XI501,15F5.0 SPECIFIES 20INTEGERS%1 OR 2DIGITS<-FOR REIN. 905. FORMAT%20I2< EVD

209.- -.StitIrgtr&

APPENDIX A(Continued)

SIBFIC SNTRG SUBROUTINE SNTORG isa COMMON/IN/BROKUP,TEXT,LENGTH _REAL BR.OKUP.X80

-210- APPENDIX A (Continued)

PEG I 7PR

4ER<,%TOT%16<,TDASH

AN AREAS'HAREb BY THIS SUBROUTINE ANDTHE MAIN PROGRAM. _COMMON/COUNT/ ICTR TYPING I4AGE COUNTER AS ANINTEGER. INTEGER ICTR TYPING THE VARIABLES ONEtTWO, AND THREE ASLOGICAL. _LOGICAL ONE,TWO,THREE TYPING CT AND IRELN ASINTEGERS.

INTEGER CTORELN_ TYPING TEMPA ASA'DOUBLE PRECISION WORD. DOUBLE PRECISION TEMPA TYPING THE ARRAY TEMPBAS REAL. REAL TEMP8%2< MAKiNd-f6i0A AND TEMPBEQUIVALENT. EQUI.VALENCE %TEMPB,TEMPA< TYPSENZTYPE OF AN AREA-tHARED BY THIS SUBROUTINEAND SUBROUTINE SENTENCE.

COMMON/ENDS/ALSPER,ALSEXCtALSQUS 4

-211- APPENDIX A(Continued)

c TYPING THE ARRAYS ASREAL. ALSPER%2<,ALSEXCU

"3 BRKUP%1<#BROkUP%ICtik BRKUP%20BROKUPUCTIMX BRKUP%3<#BROKUPUCTR2< IFUCTR.GT.79

dikfo.v -~ APPENDIX A(Continued)

IFMTEMPB%1<.EQ.PERIOIX.OR.%TEMPB%1<.EQ.EXCLAM<.0R.%TEMPB%1<.EQ.Q 1UEST<<.AND.LCTR.EQ.0( GO TO 100 SAME AS COMMENT ABOVE. IFTEMO8%1(.EQ.ALSPER<.0R.VEMP13V<.EQ.ALSEXC<.OR.%TEMPB$1<.EQ. 1ALSQUS<<.AND.LCTR.EQ.0< GO TO 100 REPLACE THREE BY FALSE. THREE#.FALSE. CHECKING. IF LETTER COUNTERNOT EQUAL TO ZERO GOTO 4 FOR FURTHER IFU.CTR.NE.0< GO TO 4 IF TEMPB$1< EQUAL TO ANITALIC MARK GO TO 100FOR FURTHER PROCESSING. IFfrfff0e%1<.EQ.ITALIC< GO TO 100 CALL PACKURKUP%l<,TEMPA,2< REPLACE TWO BY FALSE. TWO #.FALSE. THE b6.5 COOPSEIS UP THE APPROPRI-ATEINDEX FOR STATEMENITS FOLLOWING STATEMENT 100AND ALSO DETERMINES IFTEMPB%1< IS EQUAL TO APERIbb,COLON,-SEMICOUONI-EXCLAMATION,QUESTION, DASH, ITALIC, QUOTE, OR APER CENT. DO 5 CT#10,18 IRELN#RELNUT< .IF"tTEAPB%1<.EQ.PUNCT%CT<<--60 T.0 100 5 REPLACE TWO BY TRUE. -twb W.TRUE. REPLACE ONE BY FALSE. ONE#.FALSE. AN ASSIGNED GO TOSTATEMENT WHICH ELIMINATESFOLLOWING TEST. GO TO 10d _IF IMAGE IS EQUAL TOAN APOSTROPHE GOTO 500. 4 IF%ii-R-OK0i5iICtR<.EQ.APOSTR< 60 Tb-500 GO TO 200 ASSIGN THE MINIMUMVALUE OF THE TWOARGUMENTS EQUAL TO NEXT. 100 NEXT#MIN0NEXT&1,100< REPLAtE LENGTH BY 99 WHICHIS tHE. ENDOF'THE SENTENCE. LENGTMEXT<#99 -MARKS. IF TWtISTRUE GO TO 101 ANDBEGIN SUMMING P-UNCTUA-fION GO TO 101 IFUWO< INDICATED BY THE SUM THE APPROPRIATEPUNCTUATION MARKS AS SUBSCRIPT IRELN. SUMSURELNOSUMSURELN

-213- APPENDIX A(Continued)

IF TRUE INCREASE THENUMBER OF EXCLAMATIONSBY ONE. NEXCLIONEXCLAU IFUEMPB%1<.EQ.4LSEXC< MARKS BY ONE. IF TRUE INCREASE THENUMBER OF QUESTION IF%Tt-mP8i1<.EQ.QUEST< NQUES#NQUES&I. MARKS BY ONE. IF TRUE INCREASE THENUMBER OF QUESTION ITEAPB%1<.EQ.ALSQUS

300 SUMLETNSUMLEU1 FINDING THE LENGTH OF A WORD. LCTROLCTRE.1 SUMMIN6 tHE IMAGES ON A CARD TO DETERMINE WHEN ALLIMAGES HAVE BEEN PROCESSED. ICTO#ItTktf. CONTINUE WITH NEXT IMAGE. GO TO 1 SUMMING THE IMAGES ON A CARD TODETERMINE WHEN' ALL IMAGES HAVE i'3EN-OROCtSSED. 400 ICTR#ICTRE1 C CO'NfINUE--WITH NEXT IMAGE. QO TO 1 INCREASE THE NUMBER OFAPOSTROPHES BY ONE. 500 NAPOS#NAPOM INCRIMENT.LETYERtOUNTER BY ONE.--- LCTR#LCTRE1 .1--NtattzfEtifrmAt -ebbriftk ICTR#ICTR&1 CONTINUE WITH PROGRAM. GO TO 1 'EN6 SIBFTC SP SOELXX ZWORD< ----SUBROUt1Nt PEG I COMMON/IN/BROKUP,TEXT,LENGTH REAL BROKUP%80< PEG I DOUBLE PRECISION TEXT%100< COMg6N/LItfS/SUBCON05-REP,RELVROCONNECOACE.,SPELLX-9DECLAB,SVOPN -0E-G I DOUBLE PRECISION SUBCON%20<,PREP%50<,RELPRO%10<,CONNEC%30

59TDASH,TCOLON,TSEMIC,TQUOTE,TEXCLAJQUES,TPREP,TCONN,TSPELL,TRELPRPEG_ _ _ . -607kONJ,TDALEtTENDPT,TTYPE%4< PEG I EQUIVALENCE%SUMS%1<,ID

-215- APPENDIX A (Continued)

PEG I 4CT<,%SUMS%15

V ONE. V 2 NSPELL#NSPELLE1 C RETURN TO MAIN PROGRAM. RETURN END $IBFTC TSEN

-216- APPENDIX A (Continued)

SUBROUTINE TYPSEN COMMON/IN/BROKUP,TEXTgLENGTH PEG I REAL_BROKUP$80< PEG I DOUBLEJOECISIONTEXT%100< COMMON/LISTS/SUBCONOREP,RELPRO,CONNECOALE,SPELLX,DECLABgSV:IPN PEG I -DoOffiLt-ORECISION SUBCONV(kgPREP$50

REAL HLFTXT%200< EQUIVALENCE %HLFTXT/TEXT< COMMON/LOG/SENTND/ESSENO Lbort-AL "S-ENTND/E-SSEND

_EQUIVALENCE%.PUNCTUT

THIS PART TYPES EACHSENTENCE ACCORDING TO ISTEND PUNCTUATION AND FIRST WORD TYPE REPLACE LAST BY TWO TIMESNEXT MINUS ONE. LAST#2*NEXT-1 IF SENTENCE ENDS WITHA PERIOD GO TO 100 TODETERMINE TYPE OF DECLARATIVE SENTENCE. GO TO 100 IF%HLFTXT%LAST<.EQ.PERIOD.OR.HLFTXTUAST<.EQ.ALSPER< IF SENTENCE ENDS WITHAN EXCLAMATION MARKINCREASE SENTENCE TYPE %3< BY ONE. IF%HLFTXTUAST<.EQ.EXCLAM.OR.HLFTXTUAST<.EQ.ALSEXC

. PEG I COMMON/CHAR/PUNCTIALPHA_ _ _ REALPUNCTX20

-218- APPENDIX A (Continued)

EQUIVALENCE%PUNCTIl1

EQUIVALENCE ULFTXT,TEXT COAMON/LOG/SENTNO,ESSEN6 LOGICAL SENTNOtESSEND EQUIVALENCE%PUNCTX17

-219 APPENDIX A (Continued)

REAL SWRO%20< EQUIVALENCE %SWORD/SWRD<

THISPA-A-f-dAtIF'itS.WORDS OF THE SENTENCE ASPR930§1iIdN, RELATIVE PRONOUNS/ SUBORDINATINGCONJUNCTI,ON§ CONNECTIVES, Wokb-§ ON 64E7 filat Lft-f

TYPING WORD AS DOUBLE PRECISION/I.E. IT CAN CONTAIN FROM 1*.12 CHARACTERS. DOUBLE PRECISION WORD TYPING XX AND YY AS LOGICAL VARIABLES. LOGICAL XX/YY C TYPING THE FUNCTION INTABL ASLOGICAL. LOGICAL INTABL REPLACING THE VALUE OF YY BY THE VALUEOF XX. YY#XX REPLACING XX BY THE LOGICALCONSTANT TRUE. XXX.TRUE. TEST TO DETERMINE IF THE WORDIS A PREPOSITION. IFUNTABL2WORD/PREP/100<< GO Ta-100 YY WILL BE TRUE FOR FIRST ANDSECOND WORD OF THE SENTENCE. AVOIDS CHECKING THESE WORDS AS BEINGRELATIVE PRONOUNS. IF%YY< GO TO 6 TESt TO DETERMINE IF THE WORD IS A RELATIVEPRONOUN. IFUNTABLXWORDIRELPRO/20<< GO TO 200 TEST TO DETERMINE IF THE WORD ISA SUBORDINATE CONJUNCTION. 6 IFUNTABLZWORD/SUBCON140<< GO TO 300 TEST TO DETERMINE IF THE WORD ISIN'THE LIST OF DALE WORDS. IFTINTABLUORD/DALE/6000<< GO TO 400 TEST TO DETERMINE IF THE WORD ISA CONNECTIVE. IFUNTABLUORD/CONNEC/60<< GO TO 500 ASSIGN THE LOGICAL CONSTANTFALSE TO XX. XX#.FALSE. 'C RETURN TO CALLING PROGRAM. RETURN INCREMENT NUMBER OF PREPOSITIONSBY ONE. 100 NPREP#NPREPE1 ASSIGNED GO TO FOR ANOTHERINCREMENT. GO TO 400 t INCREMENT NUMBER OF RELATIVEPRONOUNS BY ONE. 200._ NRELPR#NRELPRE1 d ASSIGNED GO TO FORrktifiitikiNCREMENT. GO TO 400 INCREMENT NUMBER OF SUBORDINATECONJUNCTIONS BY ONE. 300 NSCONJ#NSCONJU C INCREMENT THE NUMBER OF DALEWORDS BY (NE. 400 NOALE#NDALEE1 TEST TO DETERMINE IF THEW&O-a-A-CONNEEffVE: IFUNTA8LUORD/CONNEC/60<< GO TO 500 RETURN TO CALLING PROGRAM. RETURN INCREMENT NUMBER60-014-NaTbitt- 500 NCONN#NCONNE1 C RETURN TOCALLING-WOGRAMO RETURN END

-220- APPENDIX A (Continued)

IBFTC SYB SUBROUTINE SEEOPN COMMON/IN/BROKUPtTEXTtLENGTH PEG I REAL--8ROKUP$90< PEG I DOUBLE PRECISIONTEXT%100< COM4d6AIStS/SUBCONOREPIRELOROICONNECtDALEISPELLXIDECLA8tSVOPN PEG I DOUBLE PRECISION SUBCON$20

51TDASHITCOLON,TSEMICITQUOTEITEXCLAITQUEStTPREPITCONNITSPELLITRELPRPEG_ I _ _ . _ . _ 6ITSCONJ,TDALEtTENDPTITTYPE%4< PEG I EQUIVALENCE%SUMS%1

-221- APPENDIX A (Continued)

2%10221<,DECLA8<,tRDTBL%10241<,SVOPN< PEGI REAL HLFTXT%200< EQUIVALENCE %HLFTXT,TEXT< COMMON/LOG/SENTNO,ESSEND LOGICAL SENTND,ESSEND EQUIVALENCE%PUNCT%17

TYPING THE ARRAY LETTER AS REAL. REAL. LETTER%12< TYPING THE FUNCTION INTABL AS LOGICAL. LOGICAL INTABL INITIALIZE SUBJECT-VERB VARIABLE.

TEST TO DETERMINE IF FIRST WORD ENDS IN A S,IF SO, RETURN TO

IF%INTABL%TEXT,SWORD,20<< RETURN REPLACE ICT BY LENGTH%1. ICT#LENGTHT1< ROTC-4CE LL BY ICT MINUSFIVE.-- LLHICT-5

CALL UNPACK%TEXT%1<,LETTER,12< PACKS WRDENDWITH WORD ENDINGDESIGNATED BY LETTER. CALL PACK_ZLETTERUL<,WRDEND,6< 1HIS--aiL STATEMENT SETS --EN-D1N6 EQUALTO ATION.-- CALL DATA %ENDING,6HATION < TE-ST7-f0 datRMIRE-IF-WRDENO IS SAME AS ATIbN. IF % ENDING EQ. WRDEND < GO TO 3 THIS CALL STATEMENT SETS ENDING EQUAL TOOLOGY. CALL DATA UADING,6HOLOGY < fESf--fo-tFEME'RMINE IF.WREIER-0 IS SAME AS OLOGY. IF % ENDING *EQ. WRDEND < GO TO 3_ THIS CALL -STATEMENTSfftS E-NoING EQUAL TO SHIP. CALL DATA ZENDING,6HSHIP < TEST TO DETERMINE IF WRDEND IS SAME AS SHIP. IF %_ END.ING 0EQ. WRDEND < GO TO3_ THIS ---tALC-t-tA-t-EkENT St-ts ENDING E-QUA-L 'TO -MENT. CALL DATA %ENDING,6HMENT < TEST TO DETERMINE IF WRDEND It SAME AS MENT. IF % ENDING .EQ. WROEND < GOTO 3 TEST TO DETERMINE IF WORD BEING ANALYZEDIS IN S-V LIST.

-222-- APPENDIX A (Continued)

IFUNTABLATEXT1X,SVOPN,300<< GO TO 3 UNPACK LETTER. CA.LL UNPACK %T.EXT%1<,LETTER,12< IF LAST LETTER NOT EQUAL TO S RETURN TO CALLING PROGRAM. IFUETTERUCT

-223- APPENDIX A (Continued)

SIBFTC TLU LOGICAL FUNCTION INTABLtAtBOR-- COMPLEX AtB%8192< INTEM-N LOGICAL COMPARISON SUBROUTINES. . . _ LOGICAL EQA,GTA,LTA C BINARY SEARCH-A IS THEARGUAENTIB THE TABLEIN THE TABLE LENGTH.

--VOLktEllitIACUEOFNBY THF V-ALUE-IN M- 11-1IVOED. BYtwo. Now At-PL.-ICE 1-tiT ABL *B-i -THE-CbdraL taiSradifFAL-St: INTABL#.FALSE. REPLACE THE VARIABLE J BY THE CONSTANT 4096. J#4096 REPLACE THE'VALUE.OP- BY'NC VALUE: tw J. K#J C TE ST TO DETERMINE IF J EQUAL TO ZERO.. IF SO, RETURN TO CALLING PROGRAM. 1 IFXJ.EQ.0< RETURN REPLACE .T.HE_ VALUE PF J By .ThE yALy_of J pIvIp_pBY TWO. iijii REPLACE J._ayTHE. MINIMUM VALUE OF.THE.TWOARGUMENT:1S! R -151 N. L#MINOXK,N< IFUTAZREALZA

-224- -Ow

MO APPENDIX A (Contimecil

REAL LSHIFT DIMENSION AX4<,B*100< 00 1 I#1,N J# %I-1

DO I I#1,M J#XI-1V6E1 MODULO FUNCTION WHEREININTERESTED IN REMAINDEROF I-1 DIVIDED BY 6. K#MODXI-1,6( IFXK.EQ.0< MOO. D#Atf< IFXI.GT.N< 0#.5

I. 8XJ<#0RUXJ

ANA 10770000000000 LGR ** SLW TEMP CIA TEMP LOQ HOLD RETURN RSHIFT TEMP BSS 1 HOLD BSS 1 END SIBFTC DATSIN SUBROUTINE DATA%A,B< TYPING THEARUtENT TA1I B VS-gat. REAL A,B SET ONE ARGUMENTEQUAL TO THE OTHER. A#B ---RE.TRN-70-.n -CATCPU RETURN

-226- APPENDIX A (Continued)

SIBMAP LCMP

* LOGICAL COMPARISONSUBROUTINES

ENTRY GTA

CAL* 3,4 LAS* 4,4 TkA RAI NOP ZAC RETURN GTA RAI CLS #0 RETURN GTA ENTRY .G.EA GE4- SAVE 4 CAL* .3,4

TRA RA2 TRA RA2 ZAC RETURN GEA RA2 CLS #0 RETURN--6t4 ENTRY LTA LTA SAVE 4 .

CAL*. _ _ _ . _ _4,4 LAS* 3,4 TRA RA3 NOP ZAC RETURN LTA RA3 CLS #0 RETURN LTA ENTRY LEA LEA SAVE 4 CAL* 4,4 LAS* 3,4 APPENDIX A(Continued) -

TRA RA4 TRA RA4 ZAC - RETURN LEA RA4 CLS *0 RETURN LEA ENTRY EQA EQA SAVE 4 CAL* 3,4 LAS* 4,4 TRA *E2 TRA RA5 ZAC RETURN EQA RA5 CLS *0 RETURN EQA ENTRY NEA NEA SAVE 4 CAL* 3,4 LAS* 4,4 TRA *E7 TRA RA6 CLS *0 RETURN NEA RA6 ZAC RETURN NEA END SIBMAP SETFP ENTRY FPTRAP FPTRAP SAVE 4 AXT -1,4 SXA SETFP.E14,4 RETURN FPTRAP EXTERN SETFP. END SENTRY PEGI X V 0 -6 A

AAPPENDIX -i29-

(Continued) APPENDIX A (Continued)

AFTER ALTHOUGH AS BECAUSE BEFORE IF SINCE THAN THOU6R- UNLESS UNTIL --WHENEVEk WHEN WHERE WHEREVER WHILE MIUMZUZUMZZUZI UZZIZZIZZiniliiiifiZZZAEHJUT. At-o-Vt-- BEHIND ALONG AMID AMONG AROUND AT BUT BELOW BENEATH BESIDE BETWEEN BEYOND DURING EXCEPT FOR CONCERNING DOWN . ._._ BY ...... _ FROM UNDERNEATH OVER PAST THROUGHOUT THROUGH TOWARD WITHIN WITHOUT uN6Eik' ----Afel UPON UP WITH UMUZZLZZZZZZIZIZZUZZZIZZZZUZZZZIZZZZZZZZIZZZUZIZZIZZZ MIZZZZZIZZUZZZIZZUZZZZIZZZZZZZZZZIZZUZZZIZZTHAT WHICH WHOM WHO WHOSE ZZZZZZZZZIZZZZZZZZZZZZIUZZUZZZZZZI --CONtEQUEMLYFIFTN ZZUZZZUZZiZZZt:Lit-i-ZLZACC6k-DING-LY-4rfERWARD HOWEVER FINALLY FIRST FOURTH FURTHERMORE HENCE --al-STEAD iIKEWISE SiEARWTICt MbikeNtk-- NIWOTHELESS INC/E-D THUS OTHERWISE SECOND SIMILARLY THEREFORE THIRD YET ZUZZIZUZZZZZZZZUZZZZUZZZZZZLIZZIMUZZZZZZZZZZZZIZZUZZ 4UZZZUZZZZZZZZIZZ1Z4Z TIS ABLE ABOARD ABOUT ACHE ---A6S-E.NT 'ACCEPT' ACtIDENT ACCOUNT AB-WE ACTS ACHING ACORN ACRE ACROSS ACT ADD 401-Rt -ADVENTURE- 'OAR A-FRAM-- kffiffgt-t-- AGAINST AFTERNOON AFTER AFTERWARD AFTERWARDS AGAIN AHEAD AN AGED AGE AGO AGREE AIRPORT AIR AID AIM AIRFIELD AIRPLANE ALARM A-LIKE ALIVE ALLEY ALONG ALLIGATOR ALLOW ALL ALMOST ALONE ALOUD ALREADY ALSO -ALWAYS AMEGCAW'-- Oforia ANGER ANGRY AMOUNT AM AND ANGEL ANT ANYBODY ANIMAL ANOTHER AN ANSWER

-230- APPENDIX A (Continued)

ANYHOW ANYONF ANY ANYTHING ANYWAY ANYWHERE APARTMENT APART APF APIECE APPEAR APPLE APRIL ApRON AREN T ARE ARISE ARITHMETIC ARMFUL ARM ARMY AROSE AR-OUND ARRANGE ARRIVED _ARRIVE ARROW ARTIST ART A ASHES ASH' ASIDE ASK ASLEEP AS ATE AT ATTACK ATTEND ATTENTION AUGUST AUNT AUTHOR AUTOMOBILE AUTO AUTUMN AVENUE AWAKEN AWAKE AWAY AWFULLY AWFUt AWHILE AX 6-AA- .6.4 t BABIES BABY BACKGROUND BACK BACKWARD BACKWARDS BACON BADGE BADLY a4o BAG BAKE BAKING BALLOON BALL BANANA BANDAGE BAND BANG BANJO BANKER BANK BARBER B REFOOT BARELY BARE BARK BARN BARREL BAR BASEBALL BASEMENT BASE BASKET BATCH BATHE -BAtHING BATHROOM BATH BATHTUB BAT BATTLE BATTLESHIP BAY- BEACH BEAN- BEARD BEAR BEAST BEATING BEAT BEAUT I FUL BEAUTIFY BEAUTY BECAME BECAUSE BECOME BECOMfNG BEDBUG BEDTIME BEECH BEEF BEDROOM BED BEDSPREAD _ .... _ 3E-tHIVE -6E04 BEER BEE BE-ESTE-Aik--BEET BEFORE BEGAN BEGGAR BEGGED BEGINNING BEGIN BEG- BEGUN BEHAVE BEHIND BEING BELIEVE BELL BELONG BELOW BELT BENCH BEND BENEATH BENT BERRIES BERRY BE BESIDE BESIDES BEST BET BETTER BETWEEN BIBLE BIB BICYCLE BID BIGGER BIG BILLBOARD BILL BIND BIN BIRD BIRTHDAY BIRTH BISCUIT BITE BITING BIT BITTER BLACKBERRY BLACKBIRD BLACKBOARD BLACKNESS BLACK BLACKSMITH BLAME BLANKET BLANK BLAST BLAZE BL ED BLET5TWO BLESS BLEW BLINDFOLD BLIND BLINDS BLOCK BLOM BLOOM Blossom BLOT BLOW BLUEBERRY BLUEBIRD BLUEJAY BLUE BLUSH BOARD BOAST BOAT BOB BOBWHITE BODIES BODY BOILER BOIL BOLD BONE BONNET BOOKCASE BOOKKEEPER BOOK BOOM BOO BOOT BORN BORROW poss BOTHER BOTH BOTTLE_ BOTTOM BOUGHT . . BOUNCE BOW-WOw BOWL BOW_ BOXCAR OXER BOXES BOX BOYHOOD BOY BRACELET BRAIN 6itAkE "BRANCH "IBRAN BRASS------BRAVE EAt5 BREAKFAST BREAK BREAST BREATHE BREATH BREEZE BRICK BRIDE BRIDGE BRIGHTNESS BRIGHT BRING BROADCAST BROAD BROKEN BROKE BROOK BROOM 6ii-tfti-IER- --6ktifidHf --BROWN- BRUSH BOBLE 'BUCKET BUCKLE BUD _BUFFALO BUGGY BUG BUILDING BUILD BUILT B016-- BULE-EY WU. BUMBLEBEE BUMP BUM BUNCH BUNDLE BUNNY BUN BURN BURST BURY BUSHEL BUSH BUSINESS BUS BUSY BUTCHER BUT BUTTERCUP BUTTERFLY BUTTERMILK BUTTER BOTTERSCOTCHBUTIONNOLE" BUTTON BUTT BUY BUZZ BYE BY CABBAGE CABINET CABIN CAB CA-CKLE--- CAGE CAKE CALENDAR CALF CALLER CALLING CALL CAMEL CAME CAMPFIRE CAMP CAN T CANAL CANARY CANDLE APPENDIX A (Continued)

CANDLESTICK CANDY CANE CANNON CANNOT CANOE CAN CANYON CAPE TATITAL . CAP CAPTAIN-- CARDBOARD CARD CAREFUL CARELESSNESSCARELESS CARE CARLOAD CARPENTER CARPET CARRIAGE CARROT CARRY CAR CART CARVE CASE CASHIER CASH CASTLE tATBIRD CATCHER 'CATCH CATERPILLAR CATFISH CAT CATSUP CATTLE CAUGHT CAUSE CAVE CEILING CELLAR CELL CENTER CENT CERAL CERTAINLY CERTAIN CHAIN CHAIR CHALK CHAMPION CRY CUB CUFF CUPBOARD CUPFUL CUP CHANCE CHANGE CHAP CHARGE CHARM CHART CHASE -Ciat-tER CHEAP CHEAT CHECKERS CHECK CHEEK CHEER CHEESE CHERRY CHEST CHEW CHICKEN CHICK CHTEUR-OOD CHILDREN CHIU5------CHILL CHILLY CHIMNEY CHINA CHIN CHIPMUNK CHIP --CROCOLATE -CHOICE CH065E- .CHOP CHORUS CHOSEN CHOSE CHRISTEN CHRISTMAS CHURCH CHURN CI6ARETTE CIRtCE CMOS- CrTIZEN CITY- tLANG CLAP CLASSMATE CLASSROOM CLASS CLAW CLAY CLEANER CLEAN CLEAT CLERK CLEVER CLIFF CLIMB CLIP CLOAK CLOCK CLOSE CLOSEf 'CLOTHES- cO-THING "Crat-ff "-CCM b 'CLOUDY CLOVER CLOWN CLUB. CLUCK CLUMP COACH . _ . COAL COASf COAT c0861ER -CU CocbA COCONUT COCOON CODFISH COD COFFEEPOT COFFEE COIN COLD COLLAR COLLEGE COLORED COLOR COLT COLUMN COMB COME COMFORT COMIC COMING 'COMPAN'Y COMPARE clikbUcTolk--- -CONE- tONNECT COOKED COOKIE COOKIES COOKNG COOK COOKY COOLER ttia 'cbbc -ttiO COPPER -C-OPY CORD CORK CORNER CORN CORRECT COST COT COTTAGE COTTON COUCH COUGH COULON I COULD COUNTER COUNTRY .COUNT COUNTY COURSE COURT 'COUSIN tOVER COWARDLY Rb COWBOY COW _COZY CRAB CRACKER CRACK CRADLE CRAMPS CRINBERRY CRANK CRANKY OASH CRAWL CRAZY CREAM CREAMY CREEK CREEP CREPT CRIED CRIES CROAK CROOKED CROOK CROP CROSSING CROSS7EYED CROSS CROWDED CROWD CROWN CROW CRUEL CRUMBLE CRUMB CRUS-H CRUST CRY CUB CUFF CUPBOARD CUPFUL CUP CURE CURL' CURLY' CURTAIN CURVE CUSHION CUSTARD CUSTOMER CUTE CUT CUTTING DAB DADDY DAD DAILY DAIRY DAISY DAMAGE DAME_ DAMP DAM DANCER DANCE DANCING DANDY DANGEROUS 'DANGER DARE DARKNESS DARLING DARN DART DASH DATE DAUGHTER DAWN DAYBREAK DAY ,DAYTIME DEAD DEAF DEAL DEAR DEATH DECEMBER DECIDE DECK DEED DEEP DEER DEFEAT DEFEND DEFENSE 6ELIGHT DEN DENTIST DEPEND DEPOSIT DESCRIBE DESERT DESERVE DESIRE DEtK D'ES-TROY ,DEVIL DEW DIAMOND DION T DID DIED DIE DIES DIFFERENCE DIFVERENT DIG DIME DIM DINE DINGDONG DINNER DIP DIRECTION DIRECT DIRT DIRTY DISCOVER DISH DISLIKE DISMISS DITCH

-232- APPENDIX A (Continued)

DIVEP DIVE DIVIDE DOCK DOCTOR DOESN T DOES DOG DOLLAR DOLL DOLLY DON T DONE DONKEY DOORBELL DOORKNOB DOOR DOORSTEP DOPE DO DOT DOUBLE DOUGh DOVE DOWN DOWNSTAIRS DOWNTOWN DOZEN DRAG DRAIN DRANK DRAWER DRAWING DRAW DPEA-m DRESSER DRESSMAKER DRESS DREW DRIED DRIFT DRILL DRINK DRIP DRIVEN DRIVER DRIVE DROP DROVE DROWN DROWSY DRUG DRUM DRUNK DRY -6UCK DUE- DUG bULL DUMB DUMP DURING DUST DUSTY DUTY 6-CE DWELL L51-4-E-Lf DYING EARLY EARN EAR EARTH EASTERN EAST EASY EATEN EAT EDGE EGG EH EIGHTEEN EIGHTH EIGHT EIGHTY EITHER ELBOW ELDER ELDEST -at-C-t-kfcfrii ELECTRIZ ELEPHANT 8-LEV-eN ELF ELM ELSE ELSEWHERE EMPTY ENDING ENtETSH ENJOif -EN6 -E-riEN-fte ENGINEER ENGINE ENOUGH ENTER ENVELOPE EQUAL ERASER ERASE PRRAND ESCAPE EVENING EVEN EVEI _V RYBO Y EVERYDAY EVERYONE EVERY EVERYTHING EVERYWHERE EVE EVIL EiCAET EXCEPT EXCHANGE EXCITED EXCITIN-t, EXCUSE EXIT EXPECT EXPLAIN EXTRA EYEBROW EYE VABLE FACE FACING FACTORY FACT FAIL FAINT FAIR FAIRY FAITH FAKE FALL FALSE FAMILY FANCY FAN FARAWAY FARE FAR-OFF FARMER FARMING FARM FAR FASTEN FAST FATHER FAT FAULT FAVORITE FAVOR FEAR FEAST FEATHER EF-BROARc( vrp FEED FEEL FEET FELLOW FELL FELT FENCE FEVER FEW FIB FIDDLE FIELD FIFE FIFTEEN FIFTH FIFTY FIGHT FIG FIGURE FILE FILL FILM FINALLY FIND FINE FINGER FINISH FIREARM FIRE_CRACKER_F.IREFLACE _FIRE FIREWORKS FIRING FIRST FIVE FIKHE.A-q-AN FISH Ffff FIT FITS FIX FLAG FLAKE FLAME FLAP FLASHLIGHT FLASH FLAT FLEA FLESH FLEW FLIES FLIGHT FLIP-FLOP FLIP __FLOAT_ FLOCK_ FLOOD FLOOR FLOP FLOUR FLOWER FLOWERY -Alow FLUTTER FLY FOAM FOGGY FOG FOLD Fails FOLLOWING FOLLOW FOND FOOD FOOLISH FOOL FOOTBALL FOOTPRINT FOOT FOREHEAD FOREST FORGET FORGIVE FORGOT FORGOTTEN FORK FORM FOR FORTH FORTH FORT FORTUNE FORTY kjuNb-- Ob-UNTAIN FOUR FO-URTEEN FREEZE FREIGHT FOX FRAME FREEDOM FREE ______._ FRENCH FRESH Off- FRIDAY FRIED FRIENDLY FRIEND FRIENDSHIP FRIGHTEN FROG FROM FRONT FUDGE FROST FROWN FROZE FRUIT FRY FURNITURE FUEL FULL FULLY FUNNY FUN "dALLOP FUR FURTHERi FUZiY GAIN ---6A-LEOti . GASOLINE GAME GANG GARAGE GARBAGE GARDEN GEAR GAS GATE GATHER GAVE ---GAY GEOGRAPHY GEESE GENERAL GENTLEMAN GENTLEMEN GENTLE GET GETTING GIANT GIFT GINGERBREAD GIRL

-233- APPENDIX A (Continued)

GIVEN GIVE GIVING GLADLY GLAD GLANCE GLASSES GLASS GLEAM GLIDE- GLOV GLOW GLUE GOAL GOAT GOBBLE GODG GODMOTHER GOD GOES GOING GOLDEN GOLDFISH GOLD GOLF GONE GOOD-BYE GOOD-BY GOOD-LOOKING GOODNESS GOOD GOODS GOODY GOOSEBERRY GOOSE GO GOT GOVERNMENT GOVERN GOWN GRAB GRACIOUS GRADE GRAIN GRANDCHILDREGRANDCHTLDGaANDDAUGHTE GRANDFATHER GRANDMA GRANDMOTHER GRANDPA GRAND GRANDSON GRANDSTAND GRAPEFRUIT GRAPE GRAPES GRASSHOPPER GRASS GRAVE GRAVEYARD GRAVY GRAY GRATEFUL GRAVEL_ GRAZE GREASE GREAT GREEN GREET GREW GRIND GROAN GROCERY GROUND GROUP GROVE GROW GUARD GUESS GUEST 0-rbE GDI-F GUM GUNPOWDER GUN GUY HABIT HADN T HAD HAIL HAIRCUT H4IRPfN HAIR HALF HALL HALT HAMMER HAM HANDFUL HANDKERCHIEF . _ HANDLE HAND HAN6WRfTING AANG HAPP-EN HAPPILY HAPPINESS HAPPY HARBOR HARDLY HARD HARDSHIP HARDWARE HARE HARK HARM HARNESS HARP HARVEST HA HASN T HAS HASTEN HASTE HASTY -HAI-CHET HATCH HATE HAT-- HAUL HAVEN T HAVE HAVING HAWK HAYFIELD HAY HAYSTACK HE 0 Ht LL HE S- HEADACHE -HEAD HEAL HEALTH HEALTHY HEAP HEARD HEARING HEAR HEART HEATER HEAT HEAVEN HEAVY HEEL HEIGHT HELD HELLO HELL HELMET HELPER HELPFUL HELP HEM 7 -HOTHOUSE HEN HERD HERE S. HERE HERO HER HERSELF HERS HE HEY HINORY HIDDEN HIDE 4ID HIGH HIGHWAY HILL HILLSIDE HILLTOP HILLY HIM HIMSELF HIND HINT HIP HIRE HIS HISS HISTORY HITCH HIT_ HIVE HOE HOG 4bLotii HOLD HOLE HOODAY. _HoLLow HOLY HOMELY HOMESICK HONEST HONEYBEE HONEYMOON HONEY Hat* 4bNOR HOOD HOOF HOOK HOOP HOPEFUL HOPELESS HOPE HOP HORN HORSEBACK HORSE HORSESHOE HO HOSE HOSPITAL HOST HOTEL HOT HOUND HOUR HOUSE HOUSETOP HOUSEWIFE HOUSEWORK HOWEVER HOWL HOW HUGE HUG HUMBLE HUMP H-Um HUNDRED HUNGER HUNGRY HUNG HUNK HUNTER HUNT HURRAH HURRIED HURRY HURT HUSBAND HUSH HUT HYMN I 0

I LL M I VE ICE ICY IDEAL IDEA IF ILL IMPORTANT IMPOSSIt1LE IMPROVE INCHES_ INCH INCOME _INDEED INDIAN INDOORS INK INN IN INSECT INSIDE- INSTANT INSTEAD INSULT INTEND INTERESTED INTERESTING INTO INVITE IRON ISLAND ISN T IS IT S IT ITSELF ITS IVORY IVY JACKET jACks JAIL -jAM JANUARY JAR JAW JAY JELLYFISH JELLY JERK JIG JOB JOCKEY JOIN JOKE JOKIN-G JOLLY JOURNEY JOYFUL JOYOUS JOY JUDGE JUG JUICE JUICY JULY JUMP JUNE JUNIOR

-234- APPENDIX A (Continued)

JUNK JUST KEEN KEEP KEPT KETTLE KEY KICK KID KILLED KILL KINDLY KINDNESS K.INo_ KINppom KIN.._ KISS KITCHEN KITE KITTEN KITTY KNEEL -kNtE KNEW KNJVES KNOB KNOCK KNOT Kfq; K.NIT . -_, KNOWN KNOW LACE -LA615tR LADIES L-AD LADY LAID LAKE LAMB LAME LAMP LAND LANE LANGUAGE LANTERN LAP LARD LARGE LASH LASS LAST LATE LAUGH LAUNDRY' CKW--- CA-14------E-AWYER -a-V------LAZY LEADER LEAD LEAF LEAK LEAN LEAP LEARNED LEARN LEAST LEATHER LEAVE LEAVING LED LEFT LEG LEMONADE LEMON LEND LENGTH LESSON LESS LET S LET LETTER LETTING LETTUCE LEVEL LIBERTY LIBRARY LICE _ . LICK LID LIE LIFE LIFT LIGHTNESS LIGHTNING LIGHT LIKELY LIKE LIKING LILY LIMB LIME LIMP LINEN LINE LION LIP LISTEN LIST LIT LITTLE LIVELY LIVER LIVE LIVES LIVING LIZARD LOAD LOAF LOAN LOAVES LOCK LOCOMOTIVE LOG LONELY LONE LONESOME LONG LOOKOUT LOOK LOOP LOOSE LORD LOSER LOSE LOSS LOST LOT LOUD LOVELY LOVER LOVE LOW LUCK LUCKY LUMBER LUMP LUNCH LYING MACHINERY MACHINE MADE MAD MAGAZINE MAGIC MAID MAILBOX MAILMAN MAIL MAJOR MAKE MAKING MALE MAMA MAMMA MANAGER MANE MANGER MAN MANY MAPLE MAP MARBLE MARCH MARKET MARK MARRIAGE MARRIED MARRY MA MASK MASTER MAST MATCH MAT MATTER MATTRESS MAYBE MAYOR MAY MEADOW MEAL MEAN MEANS MEANT MEASURE MEAT MEDICINE MEETING MEET MERRY MELT MEMBER MEND MEN MEOW ME MESSAGE MESS METAL MET MEW MICE MIDDLE MIDNIGHT MIGHT MIGHTY MILE MILKMAN MILK MILLER MILLION MILL MIND MINER MINE MINT MINUTE MIRROR MISCHIEF MISSPELL MISS MISTAKE MISTY MITTEN -41-11- MONTH MIX MOMENT MONDAY MONEY MONKEY MOONLIGHT MOON MOO MOOSE MOP MORE MORNING MORROW MOSS MOSTLY MOST MOTHER MOVE MOTOR MOUNTAIN MOUNT MOUSE MOUTH MRS. MOVIE MOVIES MOVING MOW MR. MUDDY MUD MUG AD-a-- MULTIPLY- MUCH NAIL MURDER MUSIC MUST MY MYSELF NAUGHTV NAME NAPKIN NAP NARROW NASTY NECK NAVY NEARBY NEARLY NEAR NEAT NEIGHBORHOOD NECKTIE NEEDLE NEEDN T NEED NEGRO NEVERMORE NEIGHBOR NEITHER NERVE NEST NET NEW NEWSPAPER NEWS NEi-f-- N-IB-BilE NEVER NINETEEN NICE NICKEL NIGHTGOWN N.IGHT NINE NOBODY NOb NOIS-t---- NOISY-- -NONE NINETY NOSE NOON NOR NORTHERN NORTH NO NOWHERE NOTE NOTHING NOTICE NOT NOVEMBER

-235- APPENDIX A (Continued)

OAR NOW NUMBER NURSE NUT OAK --i OATMEAL OATS OBEY OCEAN OCLOCK OCTOBER ODD OFFER OFFICER OFF ICE OFF OF ONCE OFTEN --0--H OIL bi_b-FAtHi-ONEOLD ONE ONION ONLY ON ONWARD OPEN ORAN-GE -1-iiRCH-ARD ORDER b ke ORGAN OR OTHER OTHERW ISE OUCH OUGHT OUR OURSELVES OURS OUTDOORS OUTF IT OUTLAW OUTL INE OUT OUTS IDE OUTWARD OVEN OVERALLS OVERCOAT OVEREA T OVERHEID OVERHE AR OVERNiGHT 'OVER OVERTURN OWE OWING OWL OWNER OWN OX PACE PACKAGE PACK PAD ------0-ka -nib PA IL PA INFUL PA IN PAINTER PAINT ING PAINT PA IR PALACE PALE PAL PAMER ICA PANCAKE PANE PAN PANSY PANTS PAPA PAPER PARADE 04aDbN Prik-eNT ----15-KRK P-A-R-T1 Y- PAAT-NER --P ART PA R TY PA PASSENGER PASS PASTE PAST PATH PAT PATTER PAVEMENT PAVE PAW PAYMENT PAY PEACEFUL PEACE PEACHES PEACH PEAK PEANUT PEARL PEAR PEA PEAS PECK PEEK PEEL PEEP WEt------P E--N-C- I L PENNY PEN PEOPLE PEPP-ERMINT- PEPPER PERFUME PERHAPS PERSON PET PHONE -ii Ioro------1,Tatt. PICK PICNIC PICTURE PIECE P IE P I GEON PIGGY PIG PILE PI LLOW PI LL PINEAPPLE PINE P INK PIN P INT PIPE PISTOL PITCHER PITCH PIT P I TY' PLACE PLAIN -PLANE PLAN PLANT PLATE PLATFORM PLATTER PLAYER PLAYGROUND PLAYHOUSE PLAYMV, E PC-4Y-- PLAYTH ING PLEASANT PLEASE PLEASURE PLENTY POEM PLOW PLUG PLUM POCKETBOOK POCKET .. POINT POISON POKE POLE POLICEMAN POLICE PbL I S_H POL I TE POND PONIES PONY POOL PORK POOR -131--iritoki4 --fitfiliiED POP PORCH POTATOES POTATO POSSIBLE POSTAGE POSTMAN_ POST POT POUND POUR POWDER POWERFUL POWER PREPARE PRESENT PRETTY PRA I SE PRAYER PRAY . PRICE PRICK PRINCE PRINCESS PR INT PR ISON PROVE PR I ZE PROMIS E PROPER PROTECT PROUD . PUMPKIN PRONE-- -0-0 6 C it -PO l'ii i L-E- --p-o-o-f FoLiii PUMP PUNCH PUNISH PUPIL PUpPY_ PUP 0-5-Fi-E--- 15-6-AP-CE -ii-Ulit-E-- ii-U--§-14 -0-US§ -iiuStYCAT PUSSY PUT PUTTING PUZZLE QUACK QUARTER QU ICK QUART QUEEN QUEER QUEST ION QUICKLY QU I E TQocr _qui T..E _QUIT RABBIT RACE RA-tK k-AD-10 itAbI-H R-A6 RA ILROAti 'RA IL RAILWAY_ RAIN_Esow RAIN RA INY RA ISE RAI SIN ikAfittm_Y -fiATE- ii-ii-K iiiii CH- ----k-A-N6 TAN _ READY RAY REACH READER READING READ REAL REAP REAR REASON REBUILD REALLY._.. . RECE IVE RECESS IkEtbRo- ---ArdifiRts- A-E-DB-REAS-T- -At (5 . REMIND REFUSE REINDEER REJOICE REMAIN REMEMBER REPORT REMOVE -1k-tfir Ott-OA-IR --Fit0-iiti REPEAT R I B REST RETURN REVIEW REWARD RIBBON RICE RICH R IDDLE RIDER R IDE RIDING

-236- APPENDIX A(Continued)

RID RIGHT RIM RING RIPE RIP RISE RISING RIVER ROAD ROADSIDE ROAR ROAST ROBBER ROBE ROBIN ROB ROCKET ROCK ROCKY RODE ROLLER ROLC ROOM ROOSTER ROOT ROPP ROSEBUD :CO% ROT ROTTEN ROUGH ROUND ROUTE -10-WBOAT ROW ROYAL RUBBED RUBBER RUBBISH UB RUG RULER RULE RUMBLE RUNG RUNNER RUNNING RUN RUSH RUST RUSTY samxt SADNESS SAD SAFE :(AFETY SAID SAILBOAT SAILOR SAIL SAINT SALAD SANDY SALE SALT SAME SAND SANDWICH SANG SANK SAP SASH SATIN SATISFACTDRY SAVINGS SAT SATURDAY SAUSAGE SAVAGE SAVE SAW SAY SCAB SCALES SCARE SCARF SCHOOLBOY SCHOOLHOUSE SCHOOLMASTERSCHOOLROOM SCHOOL SCORCH SCORE SCRAPE SCRAP SCRATCH SCREAM SCREEN SEA SCREW SCRUB SEAL SEAM SEARCH SEEING SEASON SEAT SECOND SECRET SEED SELECT SEEK SEEM SEEN SEE SEESAW SELFISH SELF SELL SEND SENSE SENTENCE SEPARATE SEPTEMBER SERVANT SERVE SERVICE -s-Okif SEVENTEEN SET SETTING SETTLEMENT SETTLE SEVEN SEVENTH SEVENTY SEVERAL SEW SHADE SHADOW SHAME SHADY SHAKER SHAKE SHAKING SHALL SHAN T SHAPE SHARE SHARP SHAVE SHE D SHEEP SHE LL SHE S SHEAR SHEARS SHED SHINE SHEET SHELF SHELL SHEPHERD SHE SHOEMAKER SHINING SHINY SHIP SHIRT SHOCK SHOE SHONE SHOOK SHOOT SHOPPING SHORE SHORT SHOT SHOULDER SHOULDN T =ILO SHY SHOUT SHOVEL SHOWER SHOW SHUT SIGH SICKNESS SICK SIDE SIDEWALK SIDEWAYS SILL SIGHT SIGN SILENCE SILENT SILK SINGLE SILLY SILVER SIMPLE SINCE SINGER SINK SIN SIP SIR SIS SING SIXTEEN SISSY SISTER SIT SITTING SIX SKIN SIXTH SIXTY SIZE SKATER SKATE SKIP SKIRT SKI SKY SLAM SLAVE SLED SLEEP SLEEPY LIZVE Sf.Aft SLING SLEIGH SLEPT SLICE SLIDE SLID SLOWLY SLIPPED SLIPPER SLIPPERY SLIP SLIT SLY SMACK SMALL SMART SMELL SLOW SNAPPING SMILE SMOKE SMOOTH SNAIL SNAKE SNEEZE SNOWBALLS_NOWFL_AKE sNort SNOWY SNAP SOCKS sii-d-G-- .---O-Ak toil, SOB SNUFF SOLDIER SODA SOD SOFA SOFT SOIL SOLD SOLE SOMEBODY SdAEHO-W SOME-ONE SOME- SOMETHING SOMETIME SOMETIMES SOMEWHERE SONG SON SO SOON SORE SORROW SORRY SORT SOUTHERN SOUTH SOUL SOUND SOUP SOUR SPADE SPANK SPARROW SPEAKER SPEAK SPACE SPEND SPEECH SPEED_ SPELLING SpELL SPEAR SPIN SPENT SPIDEW --t-TIki SPILL SPINACH SPIT SPLASH SPOIL SPOKE SPOOK SPIRIT SPRINGTIME SPOON SPORT SPOT SPREAD SPRING

-237- APPENDIX A (Continued)

SPR I NKLE SQUARE SQUASH SQUEAK SQUEEZE SQUIRREL STABLE STACK STAGE STA IR STALL STAMP STAND STARE STAR START STARVE STATE STA TI ON SI:AY STEAK STEAL STEAMBOAT STEAMER STEM STEAM STEEL STEEPLE _STEEP STEER ST-E-PP.ING --STEP ST I CK ST ICKY ST IFF STILLNESS STILL ST ING STIR ST ITCH STOCKING STOCK STOLE STONE STOOD STOOL STOOP STOPPED STOPP I NG STOP STORE STOR I ES STORK STORM STCIRMY STORY STOVE STRAIGHT- STRANGER STRANGE STREAM STREET STRETCH STRAP STRAWBERRY STRAW... Stii I-NG STRIPES STR IP STRONG. gTUC-K STUDY STUFF STUMP STUNG SUBJECT SUCH SUCK SUDDEN SUFFER SUGAR SUIT SUMMER SUM SUNDAY SUNFLOWER SUNG SUNK SUNL IGHT SUNNY tUOIS-E SUN SUNSET S-U N-E SU-P-PER SUPPOSE SURELY SURE SURFACE SURPR I SE SWALLOW SWAMP SWAT-"- --"-ST4E-A-a-----" SWEATER -SWEAT SWEEP SWEETHEART SWEETNESS SWEET SWELL SWEPT SWIFT SWIMMING SWI M SW ING SWORD SWORE TAB LEC LOTH TABLE TABLESPOON TABLET TACK tAG. TAILOR TA I L TAKEN --I-AKE -IAK ING TALE TALKER TALK TALL TAME TANK TAk T415-E- ;ITO TARDY TAR TASK TASTE TAUGHT TAX TEACHER TEACH TEAM TEAR TEA TEASE TEASPOON TEETH TELEPHONE TELL TEMPER TENNIS TEN TENT TERM TERRIBLE ftt THANKFUL THANK THANKSGIVINGTHANKS THAN THAT S THAT THEATER THEE THEIR THEM THEN THERE THE THESE THEY 0 THEYLL THEY RE THEY VE THEY THICK THIEF THI MBLE THING THINK THIN THIRD THIRSTY THIRTEEN THIRTY THIS THORN THO THOSE THOUGH THOUGHT. THOUSAND THREAD THREE Ti--IRE-W THROWN THROW THUMB__ THROAT THRONE THROUGH _ THUNDER THURSO--/Ti' -TAY T ICKET TICKLE T I CK TIE T I GER TIGHT TILL TIME T INKLE TIN TI NY TI P TIPTOE TIRED TIRE TITLE TOAD TOADSTOOL TOAST TOBACCO TODAY TOE TOGE THER TO-ILET TOLD TOMATO TOMORROW TONE TONGUE TONIGHT TON TOOK TOOL TOO TOOTHBRUSH TOOTHP ICK TOOTH TOOT TOP TOWARD S TORE TORN TO TOSS TOUCH TOWARD TOWEL TOWER TOWN TOW TOY TRAP .TRA_CE TRACK_ TRADE TRAIN TRAM? TRAY TREASURE TREAT TRiE TRICK TR ICYCLE TRIED TRIM TRIP TROLLEY TROUBLE TRUCK TR Y TR UE TRULY tR-UNK T-R-UtT fRufk TUNE TUB TUESDA Y TUG TULI P TUMBLE TWENTY TUNNEL TURKEY TURN TURTLE TWELVE UMBRELLA TWIC E TWIG TWI N TWO UGLY UhDERSTAND UN-DERW EAR UNDRESS UNFA IR UNFINISHED UNFOLD UNFRI ENDLY. UNHAPPY UNHURT UNIFORM UNI TEI5 S TA TEUNKI ND UNKNOWN UNLESS UNPLE-ASANT UNTIL ILL I NG UPON UPPER UP UPSET UPSIDE UNW USE UPSTAIRS UPTOWN UPWARD USED USEFUL APPENDIX A (Continued)

US VALENTINE VALLEY VALUABLE VALUE VASE VEGETABLE VELVET VERY VESSE1_ VICTORY VIEW .V4LAGE.______YINE vIeL_ET vIsuo ___..v!sIT _VOICE VOTE WAGON WAG WA-IST WAiT WAKEN WAKE WA.LK WALL WALNUT WANT WARM . . _ _ WARN WAR WASHER --iiASH ----WAskfa V44SN T WAS WASTE WATCHMAN WATCH WATERMELON WATERPR3OF WATER WAVE WAX WAY WAYSIDE WE D WE LL WE_RE WE VE WEAKEN WEAKNESS WEAK W CAD" i -I WEAPO-N. WEAR WEARY WEATITitR WFAVE WEB WEDDING WEDNESDAY WEED WEEK WEEP WEE WEIGH WELCOME WELL WENT WERE WE WESTERN WEST WET WHALE WHAT S WHAT WHEAT WHEEL WHENEVER WHEN WHM--- WHICH WHILE WHIPPED WHIP WHIRL WHISKY WHISPER WHISTLE WHITE WHO D WHO LL W4D- 3 WHOLE WHOM WHO WHOSE WHY WICKED WIDE WIFE WIGGLE WILDCAT WILD WILLING WILLOW WILL WINDMILL WINDOW WIND WINDY WINE WING WINK WINNER WIN WINTER WIPE WIRE WISE WISH WITCH WITHOUT WITH WIT WOKE WOLF WOMAN WOMEN WON T WONDERFUL WONDER WON WOODEN WOODPECKER WOOD WOODS WOOLEN WOOL WORD WORE WORKER WORKMAN WORK WORLD WORM WORN WORRY WORSE WORST. WORTH WOULDN T WOULD WOUND WOVE WRAPPED WRAP WRECK WREN WRING WRITE WRITING WRITTEN WRONG WROTE WRUNG YARD YARN YEAR YELLOW YELL YES YESTERDAY YET YOLK YONDER YOU D YOURSELF YOURSELVES YOURS YOU YOUTH ZZZZUZZZZU ZZZIZZZZUZZUZZUZZZUZZZUZZZUZZZUZZUZZZZZUZZUZZUZZUZZZZZZUZZZ IZZZZUZMUZZZUZZZZZUMUZUZIZZUMUZZZZZUZUZUZZUZZUZZZI2H ZIZZZUZZZZZZZZZZZZZZZZZZZZZZZZZZZZZUZZZZZZUZZUZZUZZZUZZZUZZZZUZZ ZZUZZUZZZUZZZZZZZZZZUZZZUZZUZZZZZZUZZUZZZUZZZZ7ZZZZZZZZIZZZZZU ZZZUZZUZZUZZIZZIZZZZUZUZZUZZZZZZZUZZZZZZUZZZZUZZZUZZUZUZZUZ ZZZZZUZZIZZZZZUZZZIZZZZIZZZZZUZZZZUZZZZZZUZZUZZZUZZUZZZUZZUZZZ ZUZZIIMZUZZMIZZUZZUZZZUZZUZZZZMUZZUZZUZZZINZUZUZZUZU ZZUZZZZUZZZUZZUZZUZZUZZZUMUZIZUZZUZZIZZUZZZZUZZZZZUZZZZU ZZIZZZZW:LIZZUZZZZUZZZIZZZZZUZZZUZZZZUZIZZZUZZZZZZUZZZUZZZZIZZZ ZUZZUZZUZZZZZZZZZZUZABSENSE ABSOLUTLY ABUNDENCE ABUNDENT ACADAMY ACCELLERATORACCEPTENCE ACCESSABLE ACCIDENTLY ACCOMODATE ACCOMODATIONACCROSS ACEDEMIC ACHEIVEMENT ACHIEVMENTACKNOWLEGE ACKNOWLEGINGACTUALY ACUMULATE ACUMULATING ACURACY ACURACY ACURATE ADEQUETE ADMITTENCE ADOLECENTADOLESENT ADVANTAGOUS ADVERTISMENTAFAIR AGGRESSIVE AGGRIVATE AGGRIVATING AGRAVATE AGRAVATING AGRESIVE AGRESSIVE AGRIVATE AGRIVATING ALLEDGE ALLEDGING ALLEGIENCE ALLOTED ALLWAYS ALOT ALOTTEb- ALPHEBET ALRIGHT AMOUNG AMUNG APARATUS APINION APOLIGIZE APOLOJIZE APPARANT APPARANT APPEARENCE AQUAINT AQUIRE AQUIRING ARGUEING ARGUEMENT ARGUEMENT ARRANGMENT ARTICAL ARTIC ASSASIN ATHELETE ATHELETIC ATITUDE ATTATCH ATTENDENCE ATTENDENT AUDIANCE AUGEST AUTHORATATIV AUTHORATY AUTOES AUTOES -BACKROUND BACEN(E "BALENCING SALOON BARBEROUS BARGIN BASICLY BASICLY BEATIFUL BEFOR BEGGER BEGINING BELEIF BELEIVE

-239- i A ....,.. APPENDIX A(Continued)

BELEIVING BENIFICIAL BENIFITED BENIFITTED BEUTIFUL BEUTY BRECKFAST BIGER BISCIT BISCUT BOOKEEPER BOUNDRY BRITIN BULETTIN BULLETTIN BURGLER BURIEL BRETH. CAMPAIN BURYED BUSNESS CAFATERIA CALENDEk CAMOFLAGE CATAGORY CAPTIN CARFUL CATAGORIES-CATAGORIES CATAGORY CHALLANGE GELLER CEMETARIES CEMETARY CENTERY CERTIN CHARACTOR CHARICTER CHILDERN CIELING CIGARETE CHANGABLE COLLOSAL CIGERETTE CIGGARETE CIGGARETTE COFEE COLLEDGE COLOSAL COMERCIAL COMITTEE COMMING COMMITEE COLLOSSAL COMUNIST COMOAkTriVE- COMPATABLE COMPETANT COMPLEAT COMPLETLY CONCIEVABLE CONCIEVE CONCIOUS CONSEAL CONSENTRATE CONSERN -CONSIOUS- CoNSISTA-NCY to45zsTA-0- CONSISTANT CONVIENCE CONSTENT CONTEMPERARYCONTEMPORERYCONTRAVERSY CONTROLING CONVINIENCE CORELATE CORIDOR COROOUTE COROBORATINGCORPERATION CRITISISM CORRESPONDANCOUNCEL COURAGOUS CRITICICM CRITICISE CURICULLAW CO-gltUiLUMtuRicucum CURRING DEPENDANT CUSTUM DECENT DEFERENCE DEFINATE DEMOCRASY --Owno-P-E bItTASE DICIPLE DIFICULT DICIPLINE DICIPL1NING DIFERENT DIFFERANT DIFFRENT DINNING DISAPOINT DILEMA DILIGANT DILLEMA DILLIGENT OISASTEROUS DISATISFIEDDISCRIBE DISCRIMANATEDISCRIMANATIDISCRIMANATI 61SCRIPTION DISGISE OISIPLE DISIPLINE DISIPLINING DISPAIR DOMINENT DISSAPOOT DISSAPPOINT DISTRUCTIONDOCTER DOMINENT _ ECSTACY EFFICIENTCY EIGTH ELECTRICTY ELECTRISITY ELIGABLE ELIGABLE EMBARASS ELABERATE ENTIRLY EMBARRAS EMINANT EMPERER ENDEAVER ENTERPRIZE ENVIREMENT ENVOLVE EPADEMIC EPEDEMIC EQUIPE° ENTRENCE tXCEDe EQUIPPMENT ER-AtIC -EXAGERATE EXAGERATING EXAUST EXCELLANCE EXCELLANT EXELLENCE EXELLENT EXEPT EXCEDING EXITABLE EkERCIZE EXERSISE EXERSIZE EXIBIT EXISTANCE EXITING EXPENCE EXPERAMENT EXPEREMENT EXPERIANCE EXITE EXTREMLY EXPLAINATIONEXPLINATIONEXTRACURICULEXTRACURRICUEXTREAM FACINATION .FALLICIES FALLICY FAMILAR FAVOR-IT- -FANT-ACY FatiNA-tE FASINATING FASINATLON FAMILIER FORIEGN FEBUARY FEILD_ FICTICIOUS FICTIOUS FINALY FREIND FRI IGHTNING----kit OWL -F-OCLFIL FOURTY GODESS FUNDEMENTAL GARENTEE GAYETY GENERALY GILTY GOVENOR GOVERMENT GRAMMER GRANDURE GRANDUR GON GYMNAZIUM HANDKERCHEIF GRUSOME GUAGE GUARENTEE GUIDENCE . HEREDITERY HANKERCHIEF HAPPENNED HAPPYNESS HARRAS HAkkASS _NUMEROUS._HUMER HEROS ....HINDRENCE .HORRABLE JiTukABLY_ ^ HURRIDLY RYGEINE 'AOOtRACY IDEALY HUNDERD Y IMFORMATION IGNORENT IMAGINERY IMEDIATE IMENSE IGNORENCE IMPERTINANT IMPORTENCE IMMAGRANT IMMEDIATLY IMMENCE 1MMIGRENT I_MpORTENT_ _INCONVI.ENCEINC.ONVINIENCINCREDABLE IMPORTENCE IMPORTENT INFLUENCIAL INDEPENDANCEINDEPENDANt INDREDIANT NEVITIBLE INtVITIBLY INOCENT WELECT INTERFERANCEINTERPERTATI pl_g_you_sIpywIVE -JELOUS JEWELERY INTREST IRRELEVENT IRRESISTABLEJtALUS JUVENIL JUVIMILE LABEROR LABRATORIES JEWLERY JOURNIES LIKLIHOOD LAYED LEIZURE LICENCE LIESURE LABRATORY LONLINESS LOOSES LITRATURE LIVLIEST LIVLIHOOD LONLEYNESS LUXERY MABE 4WD-tit MAGNIFICENSEMAGNIFISENCE LUXERIES MARIAGE MARRAGE MAINTAINANCEMALICOUS MANER MANOUVER _ MATHAMATIC A-tDECAL .mEDEttgE- MEDOW MARRIDGE MATERIEL MISCHEIVOUS MENT METEPHOR MINAMUM MINITURE MELENCHOLY MORGAGE MORILLY MISPELL MONK1ES MONOTONUS MORELLY

-240- APPENDIX A (Continued)

MOSQUITOS MUSLE MUSSLE NARATIVE NATURALY NECCESARILY NECCESITY NECCESSARILYNECCESSARY NECCESSITY NECESARY NEGHBOR _NEGROS NEIC.E._ NEIGBOR NESECITY__ NESESARTLYNESESITY NESESSARY NESESSITY NICKLE NINETH NOTICABLE NUMERUS NUMOROUS NUSANCE _OBEDIANCE_OBEDIANT OBSTICLEOBSTICLE__ OCASSION OCCASSION OCCURANCE OCCURAFitE OttURENCE -OCCURENCE OCCURING OCCURRANCE OCCURRANCE OMITED OMMIT OMMIT OMMITTED OPION OPORTUNITY OPPERATE OPPONANT OPTOMISM OPTOMIST ORGINIZE ORIGINEL ORIGINIL PAMFLET PARALIZE PARIDISE PARLAMENT PARLEMENT PARRALEL PARRALLEL -0ARTICULER PARTICULER PAYED PEASENT PECULIER PERCIEVE PERFORMENCE PERMANANT PERMENENT PERSISTANT PERSONALY PERSONEL PERSUE PERSUING PERTINANT PMAMFLET PHENOMINA PHENOMINON PLAGERISM PLAGERIZE PLAJERISM PLAJERIZE PLAUSABLE PLAYWRITE PLEASENT POLITITIAN POLITITION PORTRATE PORTRAT PORTRIT POSESSIVE POSSESIVE POTATOS PRACTICLY PRARIE PRECEED PREC/CELY PREEMIUM PREFERED PREFERING PREFORMANCE PREPERATION PRESENSE PRESTEGE PREVELANT PREVELENT PRIMATIVE PRIMATIVE PRISINER PRIVILEDGE PROBIBLY PROCEEDURE PROFFESION PROFFESSION PROFFESSOR PROMINANT PROOVE PURSUADE QUANDITY QUESTIONAIREQUIZES RADIOES RAIDO RECCOMEND RECEVE RECEVING RECIEVE RECIEVING RECOMEND RECONIZE REFERANCE REFERED REFERING REHERSAL RELEGION RELEIVE RELEIVING RELETIVE RELEVENT RELIGEON RELITIVE REMBER REMEMBERANCEREPITION REPITITION REPITITION REPRESENTITIRESPONCE RESTARANT RESTARONT RESTURANT RESTURONT REVEEL RIDACULE RIDECULE RIDECULING RIGHTOUS ROOMATE RYTHM SAFTY SATERDAY SAYED SECRETERIES SECRETERY SEIGE SENCE SENE SENTANCE SEPERATE SEPERAT1NG SEPERATION SERVENT SEVEREL SHEPERD SHERRIFF SHERRIE SHINNING SHINNY SIEZE SIEZING SIGNIFICENCE SIGNITURE SIMILER SIMPEL SINCERLY SOCIATY SOPHMORE SOUVINIR SPAGETTI SPEACH SPONSER STEPED STOMACK STOPED STOPING STORYS STRECH STRENTH STUBORN STUDING SUBTILE SUBUB SUCEED SUCESS SUGER SUGEST SUMARIES SUMARY SUPERINTENDASUPORT SUPOSE SUPRESS SUPRISE SUPRISING SURBURB SURGON SURGURY SURJERY SURJON SUROUND SURPRIZE SURPRIZING SUSPENCE SUTLE SUVENIR SWIMING SWIMING SYMBEL SYMBLE SYMBOLE TALANT TECNIQUE TEMPERERY TEMPERMENT TEMPERTURE TEMPORERILY TEMPORERY TENDANCY TENDANCY TERRABLE TERRABLY THANKYOU THERFORE THIER THOUSEND THROUGHLY TOLERENCE TOLERENT TOMATOS TOMMOROW TOMMORROW TRAGADIES TRAGADY TRAJEDIES TRAJEDY TRANSFEREDTREASUROR TRESURER TRUELY TURKIES TYRANY UNATURAL UNECESSARY UNTILL USEING USFUL USLESS VACCUM VACCUUM VALUBLE VARIOS VARIUS. VEGATABLE VEGTABLE VEIW VEA-GENCE V-ILLI4 WIERD WENODEY WENEDSDAY WETHER WICH WIEGHT WRITEN WRITTING YEILD HOW WHAT WHERE WHICH WHOM WHO WHOSE ZUZZZZZZZZZZZZZZZZZUZZ AN ZZZZZZZZZZZZAFRICAN AFRICA ALL ANOTHER BOTH ANYBODY ANYONE ANY A BLOOD CAPTAIN CHINA CHINESE CIVILIZATIONCOLLEGE COLONEL . EACH CORPORAL COURAGE CROME. DAVID DR EITHER ENGLAND ENGLISH EVERYBODY EVERYONE EIGHT FRANCE EVERY FEW FIVE FOOD FOUR GOVERNOR FRENCH GENERAL GEORGE GERMAN GERMANY

-241- APPENDIX A (Continued)

HALF HER HE HIS HONESTY HUMAN ITALIAN ITALY IT ITS JAMES AEAN JOHN JUSTICE LESS LIEUTENANT _44_g_ LITTLE MA-AA-- MANKIND MAW MANY MARY MAYOR MEN MICHAEL MINE MORE MOST MR MUCH -MY-- NEITHER NINE NdTHING ONE OTHER PAUL PEACE PEOPLE PETER PLENTY PREMIER PRESIDENT PRIVATE PROFESSOR ROBERT RUSSIAN RUSSIA SCHOOL SERGEANT SEVEN SEVERAL . _ SHE-- §TW-- SOMEBODY SOMEONE SOME SPAIN SPANISH SUCH SUE SURVIVAL TELEVISION TEN THAT THEIR THE THESE- THEY THIS THOSE THREE TIME TWO WAR WATER WE -WIRDLE WILLIAM WOMEN YOUR YOU ZZZIZZZZZZIZZZZZZZZZIZZZUZZZUZZIZZUZZZIZZZZZI -----ZZ---ti-EZZ-LZ-ZZIZZZZZZIZZEZZIZZZZZIZIZZLIZZIZZIZZIZZIZZUZZZIZZZIZZUZZUZZZMUMMIUMIUMMUMUMUIMMUMUMMIUMMUM IMMUMZIUMMIIMUUMMIZIUZIUMMUMUMMIUMMIIUMUZ ACROSS AS HAS IS LESS THUS UNLESS WAS zazIMIZUYES- 991913123311101034131718202116151914399999999999999S999999999999999 99999 SIBSYS

-242- APPENDIX B

TABLE IV-11 (A)

Predictor n = 25

SHRUNKEN MULTIPLEe-REGRESSION COEFFICIENTS

COMPUTED FRCM WHERRY FORMULA

(See Chapter IV)

Discovered Sample Size MULTR 100 125 150 175 200 225 250 275 300

.50 00 25 31 35 38 39 41 42 43 .51 lo 27 33 37 39 41 42 43 44 .52 15 29 35 38 41 42 43 44 45 46 .53 19 32 37 40 42 44 45 46 48 .54 23 34 39 42 44 45 46 47 48 49 42 45 46 48 49 49 50 .56 29 37 51 51 .57 31 39 43 46 48 49 50 .58 34 41 45 47 49 50 51 52 53 .59 36 43 47 49 50 52 52 53 54 .60 8 45 48 50 52 4 1 40 4 50 52 53 54 55 5 5 .62 42 48 51 53 54 55 56 57 57 58 .63 44 49 52 54 56 57 57 58 60 .64 46 51 54 56 57 58 59 59 65 57 48 52..0 -62 49 54 57 58 60 60 61 62 63 63 .67 51 56 58 60 61 62 62 .68 53 57 60 61 62 63 63 64 64 65 .69 55 59 61 62 63 64 65 65 6 o 6 60 62 6 6 6 66 66 7 8 .71 58 4 5 7 69 .72 60 63 65 66 67 68 68 69 69 70 70 .73 61 64 66 67 68 69 70 71 71 71 .74 63 66 68 69 69 2 6 6 6 o 1 1 2 2 73 73 73 .7 9 70 71 72 72 75 .77 67 70 71 72 73 74 74 74 76 .78 69 71 73 74 74 75 75 75 77 71 73 74 75 76 76 76 77 78 .80 72 74 75 76 77 77 77 78

-243- APPENDIX B

TABLE IV-11 (B)

Predictor n = 30

SHRUNKEN MULTIPLE-REGRESSION COEFFICIENTS

COMPUTED FROK WHERRY FORMULA

(See Chapter IV)

Discovered Sample Size hULTR 100 125 150 175 200 225 250 275 300

.50 oo 10 25 31 34 37 38 40 41 .51 Oc 15 27 33 36 38 40 41 42 .52 oo 19 29 34 37 40 41 43 43 .53 00 23 32 36 39 41 43 44 45 .54 oo 26 34 38 41 43 44 45 46 .55 oo 28 36 40 42 44 45 47 47 .56 12 31 37 41 44 I6 ---177 .57 18 33 39 43 45 47 48 49 50 .58 22 35 41 45 47 48 50 50 51 .59 25 37 43 46 48 50 51 52 52 .60 =61 31 41 46 49 51 52 .62 34 43 48 51 52 54 55 56 56 .63 37 45 49 52 54 55 56 57 57 .64 39 47 51 54 55 56 57 58 59 65 41 49 53 55 57 58 59 59 60 5 54 5 5 .1 .67 46 52 56 58 59 60 61 62 62 .68 48 54 57 59 61 62 62 63 63 .69 50 56 59 61 62 63 64 64 65 .70 52 57 60 62 63 64 65 65 .66 .71 54 59 62 63 65 765 66 67 --77- .72 56 60 63 65 66 67 67 68 68 69 69 .73 57 62 64 66 67 68 68 70 71 .74 59 64 66 67 68 69 70 71 72 7 61 65 67 69 70 70 71

.77 64 68 70 71 72 73 73 74 74 .78 66 70 71 73 73 74 74 75 75 76 .79 68 71 73 74 75 75 76 16 .80 69 72 74 75. 76 76 77 77 77 APPENDIX B

TABLE IV-11 (C)

Predictor n = 35

SHRUNKEN MULTIPLD-REGRESSION COEFFICIENTS

COMPUTED FROM WHERRY FORMULA

(See Chapter IV)

Discovered Sample Size MULTR 100 125 150 175 200 225 250 275 300

.50 00 00 14 25 29 33 36 37 39 .51 00 00 18 27 32 35 37 39 40 .52 00 00 22 29 34 37 39 40 42 .53 00 00 24 32 36 38 40 42 43 .54 00 11 27 34 37 40 42 43 44 00. 1? 0 6 9 42 43 45 46 5 00 21 32 3: 4 5 48 .57 00 24 34 39 43 45 46 47 .58 00 27 36 41 44 46 48 49 50 51 .59 00 30 38 43 46 48 49 50 52 .60 10 33 40 45 47 49 51 52 .61 17 35 42 46 49 51 52 53 54 .62 22 38 44 48 50 52 53 54 55 .63 26 40 46 50 52 53 55 56 56 58 .64 29 42 48 51 53 55 56 57 6 4 50 3 55 56 57 58 59 3 4 51 54 5 5 .67 38 48 53 56 58 59 60 61 61 .68 41 50 55 57 59 60 61 62 63 .69 44 52 56 59 60 62 62 63 64 0 46 54 58 60 62 63 64 64 65 .71 4 .72 51 57 61 63 64 66 66 67 67 67 68 68 69 .73 53 59 62 64 66 69 70 .74 55 61 64 66 67 68 69 68 69_, 70 71 71 72 5-9-64- 67 6 97 70 71 71 72 .77 61 66 68 70 71 72 73 73 73 .78 63 67 70 71 72 73 74 74 75 76 .79 65 69 71 73 74 74 75 75 .80 67 71 73 74 75 76 76 77 77

-245- APPENDIX B

TABLE IV-11 (D) Predictor n 40

SHRUNKEN MULTIPLE-REGRESSION COEFFICIENTS

COMPUTED FRom WHERRY FORMULA (See Chapter IV) Discovered Sample Size MULTR 100 125 150 175 200 225 250 275 300

.50 00 00 00 16 25 29 33 35 37 .51 oo 00 00 20 27 32 34 37 38 .52 00 00 05 23 29 33 36 38 40 .53 00 00 13 26 32 35 38 40 41 .54 00 00 18 28 34 37 40 41 43 iiJlr___ti 4

.57 oo 06 28 35 - 39 42 44 46 47 .58 00 14 30 37 41 44 46 47 48 .59 00 19 33 39 43 45 47 49 50 .60 00 24 35 41 45 47 49 50 51 .61 06 27 38 43 46 49 50 51 '57' .62 00 30 40 45 48 50 52 53 54 .63 00 33 42 47 50 52 53 54 55 .64 10 36 44 48 51 53 54 56 56 _2115 18 38 46 50 53 54 57 58 23 41 48 52 54 56 57 58 59 .67 27 43 50 53 56 57 59 60 60 .68 31 45 51 55 57 59 60 61 62 .69 35 48 53 57 59 60 61 62 63 .70 38 50 1_485.467.0624_26i_434._ .71 41 52 5 2 .72 44 54 58 61 63 64 65 66 67 .73 47 56 60 63 64 66 67 67 68 .74 49 58 62 64 66 67 68 69 69 .75 52 60 63 66 67 68 69 70 70 .76 54 61 65 67 69 70 70 71 72 .77 56 63 67 69 70 71 72 72 73 .78 59 65 68 70 71 72 73 74 74 .79 61 67 70 72 73 74 74 75 75 .80 63 68 71 73 74 75 76 76 76 APPENDIX B

TABLE IV-11 (E)

Predictor n 45

SHRUNKEN MULTIPLE,-REGRESSIONCOEFFICIENTS

COMPUTEDauxWHERRY FORMULA

(See Chapter IV)

Discovered Sample Size 250 275 300 MULTR 100 125 150 175 200 225

25 29 32 34 .50 00 00 00 00 18 27 31 34 36 .51 00 00 00 04 21 36 38 .52 00 00 00 13 24 29 33 17 27 32 35 37 39 .53 00 00 00 34 37 39 41 .54 00 00 00 21 29 00 00 0 1 6 38 40 42 44 5 00 CO 13 27 34 30 36 39 42 44 45 .57 00 00 18 44 45 47 .58 00 00 22 32 38 41 43 45 47 48 .59 00 00 26 35 40 .60 00 00 2 48 50 51 1 00 12 32 39 43 50 51 32 .62 00 18 34 41 45 48 51 53 54 .63 00 23 37 43 47 50 54 55 .64 00 27 39 45 49 51 53 54 55/ .65 00 31 42 47 50 53 54 54 56 .66 00 34 44 49 52 56 57 58 59 .67 00 37 46 51 54 59 60 61 .68 12 40 48 52 55 57 60 61 62 .69 20 42 50 54 57 59 58 60 61 62 62. .70 25 45 52 56 63 64 b5 .71 30 47 54 58 60 72 61 63 64 65 66 .72 34 49 56 59 61 63 64 66 66 67 .73 38 52 58 64 66 67 68 68 .74 41 54 59 62 68 49.. 4 70 0 7tii 48 58 63 66 67 69 67 69 70 71 72 72 .77 50 60 65 70 71 72 73 73 .78 53 62 66 69 71 73 74 74 75 .79 56 64 68 70 75 75 76 .80 58 66 70 72 73 74 APPENDIX B

TABLEIV-11 (F)

Predictor n = 50

SHRUNKEN MULTIPLE-RJERESSIONCOEFFICIENTS

COYYUTED111014WHIltRYFORMULA

(See Chapter IV)

Discovered Sample Size NUM 100 125 150 1751C1.1..200 225 250 275 300 25 29 32 .50 00 00 00 00 00 19 .51 00 00 00 00 11 22 27 31 3, .52 00 00 00 00 16 25 30 33 35 20 27 32 35 37 .53 00 00 00 00 39 .54 00 00 00 08 23 30 34 37 00 00 00 1 26 2 6 8 0 40 42 5 00 00 00 19 29 34 38 36 39 42 44 .57 00 00 00 23 31 45 .58 00 00 04 26 34 38 41 43 45 47 .59 00 00 14 29 36 40 43 .60 00 00 19 2 38 42 Q 47 48 --7417---- 00 00 23 34 40 44 46 48 50 50 51 .62 00 00 27 37 42 46 48 51 .63 00 00 30 40 44 47 50 53 54 .64 00 10 33 41 46 49 51 53 6 00 18 4tt 48 g13 4 00 23 3 50 52 54 5 5 57 58 .67 00 28 41 48 51 54 56 60 .68 00 31 44 50 53 55 57 59 60 61 069 00 35 46 51 55 57 59 61 62 0 00 8 : 6 60 0 3 .71 00 41 50 55 58 63 64 65 .72 16 44 52 57 60 61 63 64 65 66 .73 24 47 55 59 61 63 65 66 67 68 .74 29 49 56 60 2 8 62 66 6 68 62_ 9 70 70 .7 38 54 0 4 8 68 69 70 71 71 .77 42 56 62 65 72 73 .78 46 59 64 67 69 70 71 73 74 74 .79 49 61 66 69 71 72 75 75 .8o 52 63 68 70 72 73 74 API:124Ni B

TABLE IV-11 (G)

Predictor n = 55

SHRUNKEN MULTIPLE-REGRESSION COEFFICIENTS

COMPUTED FROM WHERRY FORMULA

(See Chapter IV)

Discovered Sample Size MULTR 100 125 150 175 200 225 250 275 300

.50 00 00 00 00 00 08 19 25 28 .51 00 oo oo 00 00 14 22 27 31 .52 00 00 00 00 00 18 25 30 33 .53 00 00 00 00 08 22 28 32 34 .54 oo 00 00 00 15 25 30 34 36 .55 00 00 00 00 19 27 32 36 38 .56 oo oo 00 00 23 30 35 38 40 .57 00 00 00 11 26 32 37 39 42 .58 00 00 00 17 29 35 39 41 43 .59 00 00 00 22 31 37 40 43 45 .60 00 00 00 2 1 00 00 07 29 3 41 44 4 48 .62 00 00 16 32 39 43 46 48 50 .63 00 00 21 34 41 45 48 50 51 .64 00 00 25 37 43 47 49 51 53 00 29 39 45 48 51 53 00 00 32 42 47 50 53 54 .67 00 10 36 44 49 52 54 56 07 .68 00 18 38 46 51 54 56 57 58 .69 00 24 41 48 53 55 57 59 60 0 oo 29 33 46 .72 00 37 49 54 58 60 62 63 64 .73 00 40 51 56 60 62 63 64 65 .74 00 43 53 58 61 63 65 66 67 7 1 60 6 6 66 6 68

29 52 60 64 66 68 70 70 71 34 54 62 65 68 60 71 71 72 39 57 64 67 69 71 72 73 73 44 59 66 69 71 72 73 74 75

-249- APPAIDIX C

COMPUTER PROGIAX Writt In by DamadHartotto

SIBTIV PHRAS SUIROUTINE PHRASE CMKOVIINID/DMPA) COPINCIVIWBROEUP, TErr, LENGTH CCIGICS/PSWRELNANur CCANONMAPH/PHIDIAT INTEGliR RC (12) , rrErr(200),PHEW (300,8) CMCN/ACC/TOTAL, ID INTW1Et THRZ , ASTRK, QUOTE COEHONATE/NQUOTE DATA PERIOA, =IAA,QUESTA/2H .41, 3H.Xii, 3H . Q.*/ DATA COMAA/1H,/ INTEGER CCKHAA PERIOA,EXCLAA,QUESTA DATA ASTRIV1H*/ DATA THRZS/3H22Z/ REAL RUTZ (200) DOUBIZ PRECISION TErr(100),INDP41(254) BVIVALENCE(HLETXTpTEM, ITErr) LOGICAL INTABL REAL SWOP (80) IRMO LGTHEN ( 100) , RELN( 30),fErr,TOTAL NQUOTE is 0 LCSENT ai 0 QtfCCE AB 0 IPE - 0 DIEM sat VEIT ... 4 DO 3 ISENT at laLNUT IF( ISENTAT IPE)'Go TO 3 =ENT a' Ism ..Ism/2 IF(nsomr.m.Lcsorr) GO TO 5: =ENT es WSW (LUNT) .112.1111I0A. OR . ITEIS( ISBN? IF( MIXT ( ISIOIT).11QCONAA*Cit.ITEXT 1 )4031.413TPS) GO To 5 nr(INTANATErracsnaLninwposoan Go To118 00 TO 3 118 DO k VI is le300 UMW (BM) .B1 PHIAMT( IPB, 1 ) ) GO TO 44 OD TO 4 ABzsarr.+a. 44 ZECK1itrarr(IsPa)iziamanT(Ds,2)) GO TO7 4 COMM GO TO 5 7 IPE is ISPCW + 1 KACC gm 2 RC (1) at PHRHAT(IMO) RC ( 2) am PHRHAT(IPB,2) IPC .10 IPB ' DO 13 LIPC as 3,6 0.

-.250 APPENDIX C (Continited)

IF(ITEXT(IPE).EQ.MIRMAT(IPC,IIPC))GO TO 21 IF(PHRMAT(IPC,IIPC).EQ.THRZS) GOTO 23 IF(KACC.EQ.2.0R,KACC.EQ.4.0RJACC.EQ.6)GO TO 84 GO TO 23 21 KACC KACC + 1 RC(KACC) PHRMAT(IPC,IIPC) IPE IPE + 1 13 CONTINUE GO TO 23 84IPB IPB + 1 IF(ITEXT(ISENT).2Q,PHRMAT(IPB,1)) GO TO49 IPE- ISENT + 2 GO TO5 49IF(ITEXT(ISPOW).EQ.PHRMAT(IPB,2)) GO TO7 IPE ISENT + 2 GO TO 5 23 ICFAST = ISENT 2 LCFAST = IPE Dig& KACC IF(ITEXT(ICFAST).4.ASTRK) GO TO 113 GO TO 221 113 IF(ITEXT(LCFAST.Eq.ASTRK) GO TO114 IF(ITEXT(LCFAST).EQ.COMMA.°R.ITEET(LCFAST).EQ.PERIGA.OR.ITErr(LCF 1AST).EQ.EXCLAA.OR.ITEXT(LCFAST.E7.QUESTA) GO TO-114 ICAC = LCFAST + 2 IF(ITEXT(LCFAST).EQ.COMAA.AND.ITEXT(ICAC).EQ.ASTRK) GO TO114 GO TO 221 114 QUOTE = 1 /QUOTE = NQUOTE + 1 221 WRITE(7,223) ID,IPC,QUOTE,(RC(IRC),IRC=1,LWOP) 223 FORMAT (5X, 7-5p5X,15 51, p5X, 8A6 ) TCTAL = TOTAL + 1 WRITE(6,923) (RC(IRC),IRC 1,LiioP) 923FORMAT(121OPHRASE IS02A6) DO 62 IRCA 1,8 62 RC(IRCA) 0 QUOTE-0 5CONTINUE RETURN END APPENDIX D

PL/I PROGRAM PARSE Written 17-Gerald Fisher

R# it. an array. R#(i) is the rule number applied LEIN 4-0 when parsing the i th letter.

LEN is an array.LEIgi) is the length of the rule applied to word1. Z 4- S Z is the pash-domn store-- rather Z gives the top- most symbol on the PDS.

i 4- 1

411rrlwwwonIIMMIM

J R#(i)+1.

IHave we checked to see if any rule is applicable. --i.e. is there a rule whose left hand side is Z and handle is the i th word of the sentence? APPENDIXD (Cc:tainted)

RULE MATCH NO MATCH

LEK(i) +- length of rul B#(i) f- (ith rule ap ies)

pentest iøg

Riliove top of

FREE Z Remove predictions made by Ref)

Add more terminals in ee rulefoUowin hand side of handleto PDS R#(i) back& TRY AGAIN

Unsucces fill Parse

-253- S 1 ,` 1 /_p.:1_V.-,-A-N(xX,L:.)):rS70AK:(JC (61:T16,4S('IAIA); 11.Sr LI nth pROLFOURE: 3 . OCL Ir,1(3,10 CH4R(2) INITIAL( \A oii(1); I 1 I Iss,tmollso,IBRI,S : e 1 ite - /* S>S>4-BB A S 88 . */*/ S3(7)S2(3) CH,IW,(1) CHARM CHAK(1) CHAP(2) INITIAL( /* BAAAABB */ */ /* AAABBBA */ '5,909AA',I#09'51.91A'''S'9AA't1511 61-0bl:5*$-9-1 its . - /* S>AS> 6A S BB AA AA * / 'ry f-rtTST g-S-176-13-13-/'AL /* AA>Ob--> A - **1 / 54(4)13601-16T.9110-;10) CHARM INITIAL('A'9'889'8','4'), / /* /*A. ABBA */ .- 45 56(6)S5(5)DCL-JCL R3(6,o) PUTTYTIXEO; CHAR(5) INITIAL( CAP(i)CHAR(1 ) INITIALPP,'All,'13108191A'08');I14ITTAL-08'0A'''ul,11319181T-; /r1/4 S> THE ADJ NOUN V ADJ *7- /* BAB3AB ISIIITOE$1140jN'tIV'llA0,1/9'4119.4LU:\i4p4SJY/t1419'0,9i019#'9 e 4 s 1 4 s s s /*/* S> NOU4> THE.NOUN 80Y */V ADJ */ V > IS */ 1 3CLPuTtv191/S11,11111w,ppyll 57 PAGE; (5) CrIAR(3) oi,s 'ups 9le 4e 00); /* ADJ>ADJ>HAPPY SIART */*/ 11iC' 8 - PUT PAt,E; L.L_ PAkStAS1,k1,kA060; PAL,31.1(S7,R3,1-1k,xX); 121413 CALLC.:ALL PASE(5.3,P1ePRO(X)f1-ASLAS201,R00(); P4RSE(S4,R2,PR,XX); rAR':.L(r.,5,'12,k*tpxX); -c9XX); I.15 C'LLLzi iSiVt.K; ". F /# PARSE -- A PARSING PROCEDURE* USING P-ROCEDU-RE USING .-P.R.E..b.ICtfVEAN.ACYSIS;. PREDICTIVEAN"ALYS */ - pc-PARSE.--/*THIS PRuCEDUREsENTENLE *RuLES.A PARnti WILL-ACCORDING 06TA1N TOTHE A FIRSTSPECIFIED CGPRECT SET IT IS ASSUMED THAT THE GRAMMAR IS JWGREIBAtH CF COITEXT FREE RLIIRITING PUSING OF AN INPUT STaIDARD FORM* - ARE NON-TERMINAL AND A IS JERMINAL.'* THAT EACH' ISOF tHC Ftam-y->A0(1:;.x11-a=c1)-WHERCY'lxi,... * . -----:*IHE-0-WkAAETEKRE AS FOLLOwS:A*Al* R#RULESSENTENCEX ISIS THEATHE SWITCHPARSINGINPUT SET OFTOSENTENCE TREEGRAMMARINDICATE OUTPUT. TORULES. WHETHERBE PARSED. OR 'NOT THE PARSE WAS GOOD* 2 3CLROLES(*,*)(CHECK( SENTENCEi*) 1,1)):PARSE:PROC(SENTENCE,RULES,R#,X); CHAR(*), CHAR(*),(R4(*),LEN(DIM(SENTENCE,1))) /* THESE ARE THE CTL, /*NOTE THAT Z IS CONTROLLED;IT RULES;EACH ROWAS ONE RULE */ IS THE PDS */ FIXED, X !ATM; /* IF THE PARSE IS SUCCESSFUL THEN X IS 1 AND r, OTHERWISE */ - - A 43 M=DIM(RULES,1);N=DIM(SENTENCE,1); /*M /*N GIVES GIVES THE US NUMBER THE NUMBER /*NN GIVES THE MAX NUMBE OF RULES */ OF WORDS IN SENTENCE */ OF it. NAL : ' AA=LENGTH(RULES(1,1));N1=DIM(RULES,2)-2; /* MM GIVES THE */-- M'AX WORD LENGTH */ 101211 7 R#='1;LEN=0;X=IDIB;L=1;ALLOCATE/* FIRST /* THISZwE CHAR(MM)CHARIMM) SETIS AUP COUNTER /* THEINITIAL('S');INITIAL('#'); INITIALIZATION PUSH FOR DOWN THE STORE/*SYMBOL PDS */ */ FOR BOTTOM OF STORE */ 1413 00/* IF1=1 hERE Z=10 TO'N; wE THEN SEE GOIF TOAND RULENO WHOSEMATCH;/*WE NUMBERS HANDLE WE SHOULDN'T ISMAKE THE SURE CURRENT THAT INPUTwE WORD. /* STEP THRU ONE WORD AT A THERE' IS A RULE WHOSE LEFT HAND SIDE MATCHES THE PDS DON'T TRY THE SAME RULE TWICE. fIME */ EMPTY THE PDS THIS SOON. */ BY KEEPING TRACK OF */ 16, -- RULE_CHECK:DO J=R#(I)+1 TO M; 1917 _ ENO:IF RULES(J,1)=Z&RULFS(J,2)=SENTENCE(I) THEN. GO TO MATCH; , 222120_ NO_MATCH:R#(I),LEN(I)=0;IFI=I-1; I=C THEN GO TO FREE_UP; /* RESET IN /* wE MUST BACK UP HERE /*WE CAN DO NO MORE. *-7 CASE WE RETURN TO I */ */ 2426 .1-RFE l;00EV.);L=L-1; J=1 Ta /*kEMOVE ThELEN(I); PREDICTIONS MADE BY THIS RULE */ _ . 272d:5 L=L+1:ALiti;iTE 1 CH ( 1171-1 F x 1Aek-;,,:7 HA:vDLE */ . TO I-WLE_CHECK; t\E HAVE A/* MAYdE THEkE AATCHI. IS ANOTHER THUS, 14E REMOVE .6ITH THE SAME THE TOP-MnSTSw4301. /41-AT. THIS POINT ._ ._ 32 1:4-ATCH:.R911)=J;. /*KEMEMBERFkoi! TriC THE PDS ANi5 V:EPE-ACE if ..411-H-T-FIE RULE NUMBER. */- -PAR't OF THE RULE */ 33-3634 . SET_LEN:LE14(I)=K-1;END;IF RULESIJ,K+2)=.0 THEN-GO TO SET_LEN; WORDS THAN REMAIN_IN ______4038 FREE Z; F L+K-2>N-I /* REMOVEfEST--:ALL THE THIS kULE THEN GO TO TOP */ RULE_CHECK; REQUIRE MORE 7 41- /*L=1.-1; NOw AUU THE REMAINDER OF THE RULE TO THE PUS. */ 45--46 ------Ea);-/* THIS -6-0-K=LEN(I)L=L+1;END;ALLGCATE TO 1 BY Z-1; CHAR(MM);Z=RULES(J,K+2);ENDS -THE MAIN ANALYSIS LOUP THE PDS MUST */ BE EMPTY */ 485455 FREEFkEE_fJP:IFEND Z; PARSE; 1-,=.14° THEN /*riE HAVE A SUCCESSFUL'PARSE. DO;FREE Z;GO TO FREE_UP;END; . /;; -- AVARCINGPRCOILIIVE ANALYSIS. / APPE2UI1 D(Continued A TTR IBUTE TABLE

DCL NO LDENTII=IER . ATTRIBUTES

DI M GENERIC ,BOILT- INFUNCTION STATEMENT LABEL CONSTANT kJ, FR EE_UP

AUTO-WAY IC;;.1.11-61-0-Y; X-FrD( 5 0-)

. . . . . AUTOMAT IC, BINARY9FIXED(159r9.

AUTOMATIC,BINARY,f[XED(15,0)

AUTOMATIC,BINARY1FIXECI(150),

De5; LEN (41)gAUTOMAT I C

LENGTH GENERIC BUILT- INFUNCTION AUTOMATIC,BINARYIFIXEDi1510--

. 32 .STATEMENT t_ABEL CONSTANT

MM AUTOMAT IC 9B INARY. F 15.0/ AUTOMAT IC iBINARY,F IXEDI15 v.0

159.01 NM "AUTOM'ATIC. BINARY.FIXEDI

STATEMENT.. LABELCONSTANT. 20 . NUJ4ATCH SINGLE). 1 PARSE ENTRY, DECIMAL,FLTA-T( . 5/n1 2 R# (*) g PARAMETER.:)ECIMAL,FI.XEDI CONSTANT 16 RULE_CHECK. . STA EMENT LABEL IGNEOISTR LNG ,CHARKTE.R RULES (*9*) PARAMETER, AL

2 SEATENCE (*),RARAMETER,ALIGNEDISTRING;CH4FOcTER CONSTANT. 37 7 SELLEN STATEMENT L.AaEL PARAMETER STR 1MG OM . 2 x . .

."

. , CONTROLLED, STRING,CHARACTiii. . .

. WARNINGSDETECTED.; .NO ERRORS OR , . . .

1, , . , . . re , ft.... ,/ : . . COMP ILE TIME .12 MI NS : . . ;

. ..". "':, , .

. . .

.. . 4 . -.; "" n .,en/1-1n1. n /...},1 41, Z = 'ADJ 1; Sentencelai Sig the 'smut boy is happy. Led 3 Z InounV AdjSo itnoun(rule predicts v1) adj the ADJ N&JN V haply ADJ Z ,ADJ 3;2; ATTayApply thethe ruleruls NounAdj =LIsmart boy (rule 5) il 4;5; Appli the rule the rule .V -.4 Adj'...)is happy (rule 3)4)6) RR(1)auI us 6; RR(2)us IbidApply of sentence. 5 6 1 Pat(6)= 10 RR(7)- 0 R11(3)-Ra(8)- 03 RR(9)-RE(4)- 4 RR(10)-RR(5)- 0; ThollB; RR(5)3uRR(10)31 6 0; N Sentence at sabb ran 1;.11.22;1* ZoosZatS'Z=IBBI; 1, I; a/ta \BB b Z=111113,, 2; / RR(6)inRR(1)-RE(5)-i 0o2 RR(7)=RR(2)- 10 R11(8)-RR(3)- 03 RR(4)-RR(9)- 30 RRRR(5)an (10)= 0 0; XrpoltB;RR(10)=I= 1;0; 0; PARSED Se:itence Wks RR(6)=ER(1)=:05=101B; NOT PARSED 0o RR(7)=RR(2)=RR(10)=RR(5)al 0 0; RR(8)amRit(3)= 0 ER(9)=RR(4)- 0 RR(10)=RR(5)- 0; Sentence am aaabbba

1; IS I lel

ZIKIBB'; Try rule S --> a BB IBB lam 2; Doesn't work--no rule of the form BB-4 aii --I

1; Back up one letter.Eamove BB.

Z=IS 1; Pat S back on PDS 1.111

Zus'BB,; Tkiy the rule S aS BB

ZmiS 1; Nye PDS contains[id

Im 2; Second letter again. The pair isWhij

ZIKIBBI; Try the rule S aBB Nonce PDS hasMenit.

3; Use the rule BB b Doesn't west

I= 2; Use the rule s....)AS BB bit'S 1; Nadi PDS contains broBBI;

DialsII;

Im 3; Try third letter again

7.10BB4. Use ,Ne rule S --4aBB

Im 44 Use tne ruleBB b

5; Use the ruleBB b

Im 6; Use the ruleBB b Now the PDS is empty

Im 7; a is input PDS is empty

6; Back up.

ZIKIBB.1; Put back on BB

I= 5; No rule Back up

ZwIBBI; Put back on BB

Im 44 No rule back up

bilaBBI; Put back BB

3; No rule try again t; Pat back S. Try tits rule-9 a S BB

zoos t; -260. zoos.LagI= I; 3; nOPutBack preCUML back up. S.LUIZ iur ' 7-6 ""I 0), Z=ISZ=ISI= I; I; 2;1; NoPutBack rule back up left S back up RR(6)=)0504),B;BR(1)=I= 00; That R11(10)=RR(5)=RR(7)=RR(2)3= all! 0 0; RR(8)-RR(3)- 0 RR(9)-RR(4)au 0 RR(10)-RR(5)- 0 0; ZusIAAI;I=I= 1;2; Sentence = ABBA ZiaDig7pRIAAI; S &BISZ=IBBI; I; 2; Ina 3;4; UpsIlIB;RR(6)11.RR(1)- 035; RR(10)=RR(5)=RRen=RR(2)= 02 0; RR(8)-R11(3)- 06 RR(9)-RR(4)- 05 RR(10)-RR(5)- 0 U; I=boBB1; 1;2; " FailedStatenceiy'v, because of the shaper teat BABES ,;"'",;" +".'w Mr 4,'"nra ',,:-45'4,40.;;c4%.7,-Tiqv.-A7,-.4-T,swpArl Z=137013B1;Zok1S 1; Z=1AA1; , 3;2; 7p.sw;DNB I I ; 2; Zan7008/1.1 ; 2;3; busISI= 0;1; RR(1)s,RR(6)er 00 RR(2)=RR(10)RRRI17.1 5)66) 0 0; RR(3)=RR(8)= 0 RR(9)-Ra(4)- RR(l0)-RR(5)ma 0 0; A1,111. 1, 1 , .1 , 1. , ,s1k1t* '4i 1 tit.11,tfttt 4 .e.t.475 t.et 0.1 t,., , f0 5 ze(at)m no(5)va 0 246)1nt O aN(C)Etigu(s)718 o C05 an(ot)ER 3-(apmin(L)vain(5)m sr9 fastiAIX =COHN-(9)HE 9 -Ma f(9 fir 11.111.w.1 qm mai 11i'k 1 cvs £1£slrfsaZ sone aliffautZ s Ss INC OREM at eoueluS tit:180NC References

Baker, Frank B. bcperimental design considerations associatedwith large scale research projects. Paper presented at the Seventh AnnualPhi Delta Kappa Symposium on Educational Research,Madison, Wisconsin, August 11, 1965.

Becker, J., and Hays, R. N. Information Storalie and Retrieval: Tools, Elements, Theories. New York:Wiley, 1963.

Bobrow, D. G. Syntactic Theories in ComputerImplementationson in Harold Rorko (Ed.), Automated LanguageProcessinc. New York: Wiley, 1967. Pp. 215-251.

Booth, iL L.Sequential Machines Automata Theory. New York:Wiley, 1967.

Borko, Harold (Editor). BIlamiraltWiAutomateiocess. New York:Wiley, 1967.

Borko, Harold (Editor), ,Computationsin the behavioral sciences. Englewood Cliffs, N. J. : Prentice-Hall, 1962, 633p.

Bowles, Edward. Com uters in Humanistic Research. Englewood Cliffs, N. J.: Prentice,-Hall, 1967. Pp.2

Carne, E. B. Artificial Intelligence Techniques.Washington, D.C.: Spartan, 1965.

Carroll, John B.Linguistics and the psychology oflanguage. Review of Educational Research, 1964, 34, 119..126.

Mouton Chomsky, Roam. Syntactic structures. ns-Gravenhenge, Netherlands: and Company, 1957. 116 p.

Christ41, R. E. flan: A Technique for AnalyzingGroup Judgment. Tech- nical Documentary ReportPRL4DR-63-3.February 1963. Project 7734, Air Force Systems Command, lacklandAir Force Base, Texas.

Daigon, Arthur. Computer Grading of EnglishComposition. The English Journal, January, 1966, 46-52. Educa- Dale, Edgar, and Chall, Jean.A formula for predictingreadability. tional Research Bulletin, 1948.

Diederich, Paul B., French, John 114,and Carlton, Sydell T. Factors in judgments of writing ability.ETS Research Bulletin,1961, No. 15. Pp.60.

Doyle, L. B. Seven ways to inhibitcreative research. Datamation, February, 1965. -264- References (Continued)

edition Fowler, H. W. ADictionodernlista.e. Second revised by Sir Ernest Cowers.London: Oxford University Press, 1965. 725 p.

Friedean, N. and McLaughlin, C. Iegic, rhetoric and style. Boston: Little, Brown, 1963. Pp.230.

Garber, H. nnd Shostak, R. "judge Viewpoints in GradingEssays."Paper presented at the Annual Meetingof the American Psychological Association, Washington, D. C.,September 2, 1967.

Garvin, Paul L.(Ed.)EtIRRIL-lext New York: MeGraw-aill, 1963.3 p.

Gates, Arthur H. ldifficteltords. New York: Bureau of Publications, TeachersCollege, Columbia University,1937.

Gleason, H. A., Jr.What grammar?Harvard Educational Review, 1964, 2.4, 267-281.

Gleason, H. A., Jr. Linguistics and English Grammar.New York: Holt, Rinehart and Winston, 1965.Pp.519.

414 Green Bert F., Jr.Digital computers in research.New York:McGraw- Hill,1963. 333p.

Guth, Hans P. iglieh Today and Tomorrow. Englewood Cliffs, N. J.: Prentice-Hall, 19 4.Chapter 4, p. 165-217.

New York: Hays, David G. Intro.....cluct utationstics. American Elsevier,1967. pp.231.

Hiller, Jack H., Page, E. B. andMarcotte, D. R. A Computer Search for Traits of Opinionation,Vagueness, andSpecificity-Distinction in Student Essays. Paper read at the AnnualMeeting of the American 2, 1967. Psychological Association,Waishington, D. C., September Harcourt, Brace, Hodges, John C. College Handbook, 3rd. ed. New York: 1951. 494 P. Paper read Hunt, Kellogg, W. "Recent Measures inLanguage Development." AERA Annual Meetings, to National Conference onResearch in English. Chicago, Illinois, February19, 1966. social percep- Jackson, D. N. and Messick, S.,"Individual differences in 1963, 2, Pp. tions,"..BritatIlmio, 1-10. Harvard University Kelley, T. L.Fundamental Statistics. Cambridge: Pressi 1947.

-265- References (Continued)

Keyser, S. J., and Petrick, S. R. Syntactic Analysis, 1966. (In press in a forthcomingbook.)

Klein, S. and Simmons, R. F.A computational approach togrammatical coding of English words. soci_A______,...... alki,a.....)c)fCmtnkaJurnalofAs MichinerY. 1963, 10, 334-347.

Kuno, Susumu. Some characteristics of theMultiple-Path Syntactic Analyser. Ianuaç.Jta Procseein,Cambridge: Harvard Computa- tion Laboratory, 1964, C

Kuno, S. and oattingarsA.1102m014sIL1 AAILATLANtactict Translation.. ReportNo.-NSF=.1/47.Cam.W-0-4ass.:}idgell-irvard Computation Laboratory,1965.

Korfhage, Robert R.Ioxic and Alaprithas: WithApplications to the Computer and Ilforsation So ence New York: Wileyand Sons, 1966. PP.- 94

LaBrant, Lou L. wAnalysii of Cliches and Abstractions.* Enflish Journal 38:275.78; May 1949.

Lawson, H. W.PL/I List Processing. Communioations of the ACK, 1967, 10(6), 358-367.

'avian, R. E. and Marano M4 E.'A Computer System forInference Execution and Data Retrieval.'Consunications ollarlaZ1967, 10(11), 715-721.

Loban, Walter.The lancuake of elementaryschool children. Chaapaigits /11.: National Council of the Teachersof English, 1963.

reading( Large, Irving. The tom Formula forestimating difficulty of materials. New York:Bureau of Publioatians,Teachers College, Vust=aUniversity, 1959, 20 p.

N.A. Thesis, Marcotte, Donald.Limt....:11eBeierAsmor. University of Connectiout&

Marcotte, D. R., Page, E. B1.4 and Mason, A.A Computer Analysis of Cliche Behavior in Student Prose. Paver read at the Annuli Nrfr York, Meeting of the AmericanEducational Research Association, February 18, 1967. of 'Marksheffelt Ned D.Composition, handwriting, andspelling.!levier 4duoationpl Research, 1964, 34,177-166.

Maolly, Willis' andMisstad, Robert.Comparative effectivq4pse of composition Skills learningactivities in thesecomisrreelmel. Univ. USOB qolperativo 4sagoarchProkallal. Madison, Wise.1 of.Wisconsin, 1961. 94 p. Reflrences (Continued)

McNemar, Quinn.Psychological Statistics, 3rd ed. New York: Wiley, 1962.

&min, Jack C. and Gardner, Eric F.Development and application of tests of educational achievement.Review of Educat onal Research, 1962, 22, 40-50.

Miller, George A.Same psychological studies of grammar.American Psycholodd, 1962, 17, 748-762.

Minsky, M. L.Imputation:Finite and Infinite Machines,Englewood, N.J.: Prentice-Hall, 1967.

Myers, Albert Bo, McConville,Carolyn B., and Coffman, William E. Reliability of reasoning of the College BoardEnglish Composition Test, December 1963.ETS Reearcii Bulletin, 1964; No. 42. 22p.

Oettinger, A. G.Autasatic Lalspage Translation. Cambridge, Miss.: Harvard Uhiversity Press,1960;

Ogilvie, D. M., Dunphy, D.'C., Smith, C., Stone, P. J.,with Shneidman, E., and Ferberow, N. Same characteristics of genuine versulu simulated suicide notes as analysed by acomputer system called the General Inquirer.Cambridge, Mass.: Harvard Uhiversity Labora- tory of Social Relations,1962. Ditto.

Page, Ellis B.The Imminence of Grading Essays byComputer.Phi Delta ,Reppan, January; 1966, 238-243.

Page, Ellis B. Grading Essays by Computer:Progress Report.proceedings of the1966InvitationalCel....ormeonTesb_b1Prosas. Princeton, NoloEdVtaitianal-ToTistint-Se-ce;1967rvia.P1487-100.

Page, Ellis B. "Statistical and Linguistic Strategiesin the Computer Grading of Essays."Proceedings of the Second InternationalCon- ference on Computational Linguistics.Grenoble, France.August 24, 1967. No. 34.

Partridge, Eric. A Diction& of Cliches with an IntroductoryEssay. Rsprinted (with some additions J. London: Houtledge and Kegan Paul Ltd:, 1962.260 14

Paulus, Dieter and Page Ellis B. Nonlinearity in Grading Essays. Paper read at theAnnualMeeting of th American EducationalResearch Association, New York? February16, 1967.

Paulus, Dieter.Feedback in Project EsiayGNde. Paper read at the Annmal Mootlng of.the AmericanPsychological Association, Washington, D.C., September 2, 1967.

PenceR. W. A Grammar ofPrisent-Darrnalte. New York:Macmillan, 19,47.

-267- References (Continued)

Postal, Paul M. Underlying and superficial linguistic structure. Harvard Edueational Review, 1964, 34, 246-266.

Puckett, Arlette L.STUFF: String Utility Routines for FORTRAN IV. SHARE Program Libram 1966, S.M. 3454.

Quillian, M. Ross.Semantic Memou. Cambridge, Mass.:. Bolt Beranek and Newman, 1966.

Sakoda, James M. DYSTAL manmal: Dynamic Storage Allocation Language in FORTRAN. Providence: Brown University Sociology Computer laboratory, 1964. 195 p. + app. Mimeograph.

Scott, T. M. Composition for College Students. New York:Macmillan, 1960.

Simon, Herbert A. Simulation of human thinking. In Martin Greenberger (Editor), Management and the computer of the future. Cambridge, Mass.: Massachusetts Institute of Technology Press, 1964. Pp. 94431.

Stolz, Walter S., Tannenbaum, Percy H., and Carstensen, Frederick V. A stochastic approach to the grammatical coding of English. University of Wisconsin Communications Center, 1965? Ditto,

. 23 PP

Stone, Philip. Paper read to the Conference on Uses of Educational Data Banks, Boston, Mass., Dec. 4, 1964. Unpublished.

Stone, Philip J., and Hunt, Earl BeA computer approach to content analysis: Studies usilm the General Inquirer system.Proceedings of the Spring Joint.Computer Conference, 1963, 241-256.

Stone, Philip J., Dunphey, Dexter C., Smith, Marshall S.,. and Ogilvie, Daniel M. The General Inquirer:A C.i.uter Approach to Content Analysis, Cambridge:M.I.T. Press, 19... Pp. 651.

Strunk, W., Jr., and White, E. B.The Elements of Style.New York: Macmillan, 1959.

Summey, George Jr.American Punctuation. New York:Ronald Press Co., 1949.

Tatsucika, M. and Tiedeman, D. V. NStatistics as an Aspect of Scientific Method in Research on Teaching."In Gage, N. L. (Fd.), Handbook of Research on Teaching. Chicago: Rand McNally, 1963. Ch. 4.

Warriner, John E. Handbook of English. New Yo*: Harcourt, Brace, 1951. 594 p.

-2 6 8- References(Continued)

English Grammarand Composi- Warriner, John E.and Griffith,Francis. and Co.,1937. tion.New York: Harcourt, Brace Commications ofthe Symmetric ListProcessor. Weisenbaum4 J. 6(9),524-mm. Association forComputingMaehineri, 1963, Paper Question-AnsweringSystem. Woods, WilliamA. Semantics fora of theAssociation forMachine read at theAnnual Meeting N.J. Linguistics.Atlantic City, Translation andComputational April 21, 1967. Cambridge, Mass.: There, V.An intrhmtion pe Institut-4-6f Cambridge, Mass.: COMIT programmersreference manual. Ingve, V. Technology Press,1962b. Pp. 61. MassachusettsInstitute of