Zeuscansion: a Tool for Scansion of English Poetry

ZeuScansion: a tool for scansion of English poetry Manex Agirrezabal, Bertol Arrieta, Aitzol Astigarraga Mans Hulden University of the Basque Country (UPV/EHU) University of Helsinki Dept. of Computer Science Department of modern languages 20018 Donostia Helsinki, Finland [email protected] [email protected] [email protected] [email protected] Abstract 1.1 Scansion Conventionally, scanning a line of poetry should We present a finite state technology based sys- yield a representation where every syllable is tem capable of performing metrical scansion marked with a level of stress—typically two or more of verse written in English. Scansion is the traditional task of analyzing the lines of a levels are used—and groups of syllables are divided poem, marking the stressed and non-stressed into units of feet. Consider, for example, the follow- elements, and dividing the line into metrical ing line from John Keats’ poem To autumn. feet. The system’s workflow is composed of several subtasks designed around finite state To swell the gourd, and plump the hazel shells machines that analyze verse by performing tokenization, part of speech tagging, stress placement, and unknown word stress pattern Here, a natural analysis is as follows: guessing. The scanner also classifies its input according to the predominant type of metrical foot found. We also present a brief evaluation -’-’-’-’-’ To swell |the gourd |and plump |the haz|el shells of the system using a gold standard corpus of human-scanned verse, on which a per-syllable accuracy of 86.78% is reached. The program We use the symbol ’ to denote marked (ictic) syl- uses open-source components and is released lables, and - to denote unmarked ones (non-ictic). under the GNU GPL license. That is, we have analyzed the line in question to follow a stress pattern 1 Introduction DE-DUM DE-DUM DE-DUM DE-DUM DE-DUM Scansion is a well-established form of poetry analysis which involves marking the prosodic meter of lines of verse and possibly also dividing the lines and also to consist of five feet of two syllables each into feet. The specific technique and scansion nota- in the order unstressed-stressed. Indeed, this is the tion may differ from language to language because most common meter in English poetry: iambic pen- of phonological differences. Scansion is tradition- tameter. ally done manually by students and scholars of po- The above example is rather clear-cut. How a par- etry. In the following, we present a finite-state based ticular line of verse should be scanned, however, is software tool—ZeuScansion— for performing this often a matter of contention. Consider a line from task with English poetry, and provide a brief evalua- the poem Le Monocle de Mon Oncle by Wallace tion of its performance on a gold standard corpus of Stevens: poetry in various meters. I wish that I might be a thinking stone 18 Proceedings of the 11th International Conference on Finite State Methods and Natural Language Processing, pages 18–24, St Andrews–Sctotland, July 15–17, 2013. c 2013 Association for Computational Linguistics Here, matters are much more murky. Regarding Disyllabic feet the ambiguity in this line, the poet Alfred Corn notes - - pyrrhus that - ’ iamb . there is in fact room for disagree- ’ - trochee ment about the scansion of this line. But ’ ’ spondee Stevens is among the most regular of the metrists, and he probably heard it as five Trisyllabic feet iambic feet.1 Still, an alternative scansion - - - tribrach is: one iamb, followed by a pyrrhic foot,2 ’ - - dactyl followed by two strong stresses, followed - ’ - amphibrach by two iambs. - - ’ anapest - ’ ’ bacchius In line with the above commentary, the following ’ ’ - antibacchius represents several alternative analyses of the line in ’ - ’ cretic question: ’ ’ ’ molossus Examp.: I wish that I might be a thinking stone Table 1: Metrical feet used in English poetry 1st: - ’ - ’ - ’ - ’ - ’ 2nd: - ’ - - ’ ’ - ’ - ’ predominant format used in a poem. The following illustrates the analysis produced by our tool of a 3rd: - ’ - ’ ’ ’ - ’ - ’ stanza from Lewis Carroll’s poem Jabberwocky: 4th: - ’ - - - ’ - ’ - ’ 1 He took his vorpal sword in hand: The first variant is the meter (probably) intended 2 Long time the manxome foe he sought- by the author. The second line is Corn’s alternative 3 So rested he by the Tumtum tree, scansion. The third and fourth lines show the out- 4 And stood awhile in thought. put of the software tools Scandroid and ZeuScan- 1 - ’ - ‘- ’ - ’ 2 ’ ’ - ‘’ ’ - ’ sion, respectively. 3 ’ ‘- - - - ‘- ’ In short, evaluating the output of automatic scan- 4 - ’ -’ - ’ sion is somewhat complicated by the possibility of In addition to this, the system also analyzes the various good interpretations. As we shall see below, different types of feet that make up the line (dis- when evaluating the scansion task, we use a gold cussed in more detail below). ZeuScansion sup- standard that addresses this and accepts several pos- ports most of the common types of foot found in En- sible outputs as valid. glish poetry, including iambs, trochees, dactyls, and anapests. Table 1 shows a more complete listing of 2 The output of ZeuScansion the type of feet supported. As there exist many different established systems 2.1 Metrical patterns of scansion, especially as regards minor details, we have chosen a rather conservative approach, which Once we have identified the feet used in a line, we also lends itself to a fairly mechanical, linguistic rule can analyze for each line the most likely meter used. based implementation. In the system, we distinguish This includes common meters such as: three levels of stress, and mark each line with a stress Iambic pentameter: Lines composed of 5 • pattern, as well as make an attempt to analyze the iambs, used by Shakespeare in his Sonnets. 1 Iambic foot: A weak-stressed syllable followed by a Dactylic hexameter:3 Lines composed of 6 strong-stressed syllable. • 2Pyrrhic foot: Two syllables with weak stress. 3Also known as heroic hexameter 19 dactyls, used by Homer in the Iliad. for stress assignment. As with the other documented projects, we have not found an implementation to re- Iambic tetrameter: Lines composed of 4 iambs, • view. used by Robert Frost in Stopping by Woods on a Snowy Evening. 4 Method For example, if we provide Shakespeare’s Sonnets Our tool is largely built around a number of rules as input, ZeuScansion classifies the work as iambic regarding scansion developed by Peter L. Groves pentameter in its global analysis (line-by-line output (Groves, 1998). It consists of two main components: omitted here): (a) An implementation of Groves’ rules of Syllable stress _’_’_’_’_’ scansion—mainly a collection of POS-based Meter: Iambic pentameter stress-assignment rules. 3 Related work (b) A pronunciation lexicon together with an out- There exist a number of projects that attempt to au- of-vocabulary word guesser. tomate the scansion of English verse. In this section, (a) Groves’ rules we present some of them. Scandroid (2005) is a program that scans En- Groves’ rules assign stress as follows: glish verse in iambic and anapestic meter, written by 1. Primary step: Mark the stress of the primarily Charles O. Hartman (Hartman, 1996). The source stressed syllable in content words.6 code is available.4 The program can analyze poems and check if their stress pattern is iambic or anapes- 2. Secondary step: Mark the stress of (1) the sec- tic. But, if the input poem’s meter differs from ondarily stressed syllables of polysyllabic con- those two, the system forces each line into iambic tent words and (2) the most strongly stressed or anapestic feet, i.e. it is specifically designed to syllable in polysyllabic function words.7 only scan such poems. AnalysePoems is another tool for automatic scan- In section 5 we present a more elaborate example sion and identification of metrical patterns written to illustrate how Groves’ rules are implemented. by Marc Plamondon (Plamondon, 2006). In contrast to Scandroid, AnalysePoems only identifies patterns; (b) Pronunciation lexicon it does not impose them. The program also checks To calculate the basic stress pattern of words nec- rhymes found in the input poem. It is reportedly essary for step 1, we primarily use two pronuncia- developed in Visual Basic and the .NET framwork; tion dictionaries: The CMU Pronouncing Dictionary however, neither the program nor the code appear to (Weide, 1998) and NETtalk (Sejnowski and Rosen- be available. berg, 1987). Each employs a slightly different nota- Calliope is another similar tool, built on top of tion, but they are similar in content: they both mark Scandroid by Garrett McAleese (McAleese, 2007). three levels of stress, and contain pronunciations and It is an attempt to use linguistic theories of stress stress assignments: assignment in scansion. The program seems to be NETTALK format: unavailable. abdication @bdIkeS-xn 2<>0>1>0<<0 Of the current efforts, (Greene et al., 2010) ap- CMU format: pears to be the only one that uses statistical meth- INSPIRATION IH2 N S P ER0 EY1 SH AH0 N ods in the analysis of poetry. For the learning pro- The system uses primarily the smaller NETtalk cess, they used sonnets by Shakespeare, as well as dictionary (20,000 words) and falls back to use a number of others works downloaded from the In- CMU (125,000 words) in case a word is not found ternet.5 Weighted finite-state transducers were used 6Content words are nouns, verbs, adjectives, and adverbs. 4http://oak.conncoll.edu/cohar/Programs.htm 7Function words are auxiliaries, conjunctions, pronouns, 5http://www.sonnets.org and prepositions. 20 English poetry text RHYTHMI-METRICAL SCANSION GROVES' RULES Tokenizer 1st step POS-tagger 2nd step Cleanup Are the words in N Closest word the dictionary? finder Metrical information Y Figure 1: Structure of ZeuScansion in NETtalk.

Load more