David W. Packard Computer-Assisted Morphological Analysis of Ancient

DAVID W. PACKARD COMPUTER-ASSISTED MORPHOLOGICAL ANALYSIS OF ANCIENT GREEK INTRODUCTION I This system for automated morphological analysis of ancient Greek had its origin in a practical need rather than a theoretical concern for natural language parsing. Our immediate goal was to develop a new textbook and curriculum for teaching ancient Greek to American university students. Most traditional methods assume that a student is willing to spend at least one year in the study of grammar before reading any significant quantity of literature. Our conviction is that students can begin reading literature very early in the first year if the initial grammatical instruction is focused on those features of the language which actually occur in the texts first read. To test this theory, we have used the computer to help produce a complete lexical and grammatical analysis of 40,000 words of ancient Greek selected from texts which students might wish to read in their first year. We have concentrated our attention initially on morphological analysis since the complexity of Greek morphology is the major obstacle to learning the language. We have prepared statistical summaries of the morphology of each text, as well as complete concordances organized both according to dictionary lemma and morphological category. Although our first goal is to collect information for a textbook, it is obvious that an automated system for morphological analysis will have uses far beyond the teaching of elementary Greek. THE METHODS OF ANALYSIS Our system is based on a combination of computer analysis and subsequent editorial verification. The program is able to identify most 1 I wish to thank the University of California for sponsoring this work under its program Innovative Projects in University Instruction. 344 DAVID W. PACKARD Greek words automatically, but we always examine carefully the resulting analysis and make provision for correcting and supplementing it manually wherever necessary. No prior editing of the text is required; the program will accept any Greek text which includes the normal diacritical signs (accents, breathing marks, and iotas subscript). The words are analyzed in the order in which they appear in the text without being sorted into al- phabetical order. Each word is first examined to determine whether it occurs in a list of exceptional forms. If the word is found in this list, it is not subjected to further analysis. This list, which we call the indeclinable list, contains forms which are not inflected (prepositions, adverbs, particles, etc.) as well as forms whose inflection is highly irregular. The list currently contains about 800 entries. Roughly 50% of the words in a typical text are found in this list. Words not in the indeclinable list must be analyzed according to the rules of Greek morphology. Since the analytical procedure has refinements peculiar to Greek, it cannot be understood without some knowledge of how Greek nouns and verbs are inflected. Most inflection in Greek consists of adding an ending to a fixed stem. The present active indicative of the verb ~.p~0c0 'write' is conjugated: y~&~0-co, yp&~0-¢~, -¢p&~-~, etc. The task of analyzing such forms is simply one of segmenting the word into a stem and an ending. The program first removes the final letter of the word and determines whether that letter appears in the table as an inflectional ending. If it does, the remainder of the word is tentatively assumed to be a stem, and a search is made for this stem in the dictionary. If the stem exists and is consistent with the ending, the program identifies this as one possible analysis. The program then continues by searching for longer endings. Each possible combination of stem and ending must be examined since the word may be ambiguous. In some cases the original juncture between stem and ending is obscured by phonological changes. The verb &y0cz~&co 'love' was originally conjugated with the same endings as yp&q~o~: 0~0c~&-o~, &3'~z~&-e~g, 0~y0c~&-e~, etc., but in the standard literary dialect of Athens (Attic) the adjacent vowels contract, giving the following forms: 0~'0~, &¥0c~¢, &¥~n~, etc. We treat these contracted forms as a separate conjugation with a separate set of endings, even though this produces a false division between stem and ending: 0~.0cTt-~, ~¥~-8~, &~,0cr~-~, etc. With the proper selection of such pseudo-stems, it is possible to break down nearly any inflected form into a stem and an COMPUTER-ASSISTED MORPHOLOGICAL ANALYSIS OF ANCIENT GREECK 345 ending. Especially troublesome are nouns of the third declension where the nominative singular and dative plural are subject to a great variety of sound changes. The nominative v6~ 'night ', for example, and its dative plural v~ must be placed in the indeclinable list since they cannot be reconstructed directly from the stem vu×~:-. Words like ~0~0V.0c 'lesson ', however, are broken into a pseudo-stem ~0~0v~- and the endings -~, -/z0c-ro~, -g.~-¢~, -~.r0c, -I*0~'r~0v,-g.0c~r~. This allows us to avoid the need to enter each nominative singular and dative plural into the exception list, as would have been necessary with the linguistically correct stem ~0~0vl~0~'r-. Such decisions of expediency do not affect the final analysis but only the construction of the tables used by the program. A Greek verb can have six principal parts; these are the stems which form the basis for conjugation. The present stem of'ro0~0co, for example, is yp~q~-, the future yp0c+-, the aorist ¥p~+-, the prefect y~yp0c~o-, the aorist passive yp0~(0)-. Past tenses of the indicative (aorist, imperfect, pluperfect) augment these stems by prefixing an initial ~-, or if the stem begins with a vowel, by lengthening that vowel. The imperfect indicative built on the stem ~,p0~- is ~-y~0~0-ov, ~-yp0~0-~g, ~-¥p0~-e, etc. The aorist indicative built on the stem ~x0- is {x0-ov,-~x0-ev, {x0-~, etc. In an earlier version of our program we included in the dictionary both augmented and unaugmented forms of each stem (~'p0~0- and ~¥~0~0-, ~0- and {~0-). This was uneconomical since the augmented form is nearly always predictable. The dictionary now contains only unaugmented stems except for a few verbs like ~Zo~ and 6p0100 which are augmented in special ways (e~:Zov and ~&l~00v). Reduplicated perfect stems, however, are entered in the dictionary. Greek shows great freedom in forming compound verbs by the addition of prepositional prefixes. From the stem yp=~0- is derived r~0~p0~-yt~-eo 'write beside ', ×0~.~-yl~0-~o, 'write down', 6r~o-yt~0~0-~o 'write under ', etc. It would be uneconomical to include z~0~0~yl~0~0-, ×.r.yp.~-, and 6r~oyl~.q~- in the dictionary since all are formed by the addition of common prefixes to the single verb stem yp~-. A dif- ficulty arises, however, from the fact that the prefixes are often assimilated phonetically to the following letter. The prefix ouv- ' together' appears as auv- before vowels and dental consonants, as av~- before labial consonants, as ou'i'- before guttural consonants, as ou?~- before ?~, and as ov(~) before ~. The prefix ~ex0~- appears as V.~r0~-before consonants, ~t~0- before vowels with aspiration and ~¢z- before vowels without aspiration. The program must recognize the assimilated forms of each 346 DAVID W. PACKARD prefix and must verify that the letter following the prefix could in fact have caused the suspected assimilation. In some cases a single verb is compounded with as many as three prefixes, each of which may appear in an assimilated form. The form ~u~,×0t0t~r~t must be analyzed as o~ + ×~x0~ + ~, o~0~t~ as o~ + ~z=~+ 0~ + ~o~t~. Further complication is caused by the fact that verbal augments come before the stem but after the prefixes. The imperfect of o~.-~,-e0 is ou~-~-~w-o~. Thus, if the word cannot be analyzed directly into a stem and an ending, the program must attempt to remove prepositional prefixes from the beginning of the word. If a hypothetical prefix can be re-- moved, the program proceeds to analyze the remainder of the word. If this analysis is successfu! the prefix is reunited with the word in the final analysis. In some cases the program makes more than one hypothetical division between prefix and stem. The verb &~-;~6o~ would generate three hypothetical divisions: &,~0~+ &x6~o, &,~0~+ &x6~0, and finally ~,~0~+ X6~. If the word still cannot be analyzed, the program attempts to isolate a verbal augment at the beginning of the stem (but after any prepositional prefixes). Most imperfect and aorist indicative verbal forms are analyzed only at this point. The program often generates several hypothetical unaugmented stems. The imperfect indicative of @co is ~o~. In analyzing this form, the program would make two initial false attempts: augment + $~y-, augment + ~y-, before finding the correct analysis augment + &y-. Both prefix and augment may be ambiguous. The form z=xp~-couu would produce eight hypothetical divisions: z=~p~ + Air-, ~p~ + augment + ~z- zc~0~ + augment + fix- ~p~ + augment + ¢~.'r- ~p~ + augment + zkr- ~0tp0t + augment + ~kr- zc0t~0t + augment + ~[-=-. Final short vowels are often elided, especially in poetry. If the program finds an apostrophe at the end of a word, it hypothetically restores each short vowel in turn. Most elided forms can be reconstructed successfully by this method. Crasis, the merging of two words into one, is more difficult to recognize automatically. We simply enter COMPUTER-ASSISTED MORPHOLOGICAL ANALYSIS OF ANCIENT GREEK 347 the most common examples (e.g.

David W. Packard Computer-Assisted Morphological Analysis of Ancient

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support