C.W. Forstall1 and W.J. Scheirer2 Motivation References Elegiac Couplets the Functional N-Gram Analysis the Significance Of
Total Page:16
File Type:pdf, Size:1020Kb
A Statistical Study of Latin Elegiac Couplets C.W. Forstall1 and W.J. Scheirer2 1. Department of Classics, State University of New York at Buffalo 2. Department of Computer Science, University of Colorado at Colorado Springs Motivation The Significance of the bi-gram er We are interested in understanding Analysis of stylistic difference in the elegiac couplets of Catullus 0.35 the nature of the sound that is bi-gram frequencies for the sequence ’er’ 50 line samples The values taken on by a distinct constrained by elegiac couplets – Major stylistic shift between 64 & 65. Catullus 65 - 116 functional n-gram have been found to 0.3 Poem 65 begins the elegiac corpus does it reflect the voice of the poet, Average: vary by meter and poet. They can 0.241 Catullus Catullus reveal much about the style of a single or the general style of the elegiac 1 - 60 61 - 64 0.25 Average: Average: poet. form? 0.179 0.171 probability Calculating the associated 0.2 probabilities for er over a collection of 50 line samples spanning the entire Within the Digital Humanities, stylistic studies have been produced for a wide variety of 0.15 Catullan corpus exposes a clear break literature, including poetry. Existing feature sets and analysis techniques have most often between the elegiac poems (65 - 116) examined texts at the word-level. A word-level examination captures only part of the underlying and the rest. sound content of a poem, which is fundamental to its composition. Here we introduce a variety 0.1 5 10 15 20 25 30 35 40 45 of sound-based statistical features found to be useful descriptors of Latin poetics. sample In this work, we look at the role repetitive sound plays in the Latin elegiac couplet, where just a Analysis of stylistic variation between different books of Tibullus 1 single character-level bi-gram can be a defining component of the form . We are working to 0.35 bi-gram frequencies for the sequence ’er’ incorporate our feature sets and classification components into the University at Buffalo’s The standard deviation of the bi-gram Tesserae2 project, an online tool which provides scholars studying Latin poetry easy access to Tibullus Tibullus Tibullus frequency er, calculated over samples Book 1 Book 2 Book 3 sophisticated textual analysis tools. 0.3 ! = 0.03226 ! = 0.04335 ! = 0.0532 drawn from a particular poet, indicates the additional presence of an author 1,3 signal. This work is part of an ongoing study of repetitive sound and its relationship to style in poetry. 0.25 For instance, for 50 line samples probability representing the three individual Elegiac Couplets 0.2 books of Tibullus the highest standard The elegiac meter4 is used for a variety of themes, most notably Love5. The elegiac couplet deviation belongs to Book 3, which is is a pair of two different one-line “verses”: 0.15 attributed to a collection of poets, Highest deviation in Book 3. including Tibullus, Sulpicia, and other Book 3 is generally attributed to other poets. (often inferior) writers. 0.1 5 10 15 20 25 30 35 sample In the above, — represents a long syllable and ˘˘ a pair of short syllables; the two symbols superimposed represent the poet's choice of either one long or two shorts. The first verse is A Comparison of Two Meters identical to a verse of dactylic hexameter; the second, often called the “pentameter” verse of Word Length in Elegiac Couplets and Dactylic Hexameters the couplet, is shorter by two half-feet. word length ! 6.2 ! Beyond bi-gram frequencies, useful ! ! ! Catullus 85 ! ! results were obtained from mean word !! ! ! ! ! !! ! 6.0 !!! ōd’ ĕt ămō. quār’ īd făcĭām, fōrtāssĕ rĕquīrīs. length, the feature most sensitive to !!! !!!! !! ! !! ! !! !! ! ! ! ! ! ! catull. hex ! !! !!!!! ! ! nēscĭŏ, sēd fĭĕrī sēntĭ’ ĕt ēxcrŭcĭōr. !!!!! meter. The number of characters per !! !!!!! ! !! ! !!!!!!! ! ! catull. eleg. !!!!!!!!!! !! !!! !!!!!! !! !!!!!!!!!!!!!!! hor. hex. word tended to be higher for dactylic !!!!! !!! !!! 5.8 ! ! !!!!!!!!!!!!! !!! !!!! !! ! juv. hex. I hate and I love. Perhaps you ask why I do it? ! !!!!!!!!! !!!! !!!!! !!!! !!!! hexameter than for elegiac couplets !!!! ! !!!!!!! luc. hex. !!!!! !!!!!!!!!! !! ! !! I don't know, but I feel it happening, !!!!!!!!! ! ! ! !! ! lucr. hex. both within and between authors. ord !!!!!! !!!! !!!! ! !! !! !!! !! ! ov. eleg. 5.6 ! !! and I am in torment. !!! ! ! ! !! !!!! ! ov. hex. ! ! ! ! ! !!! ! ! ! ! prop. eleg. chars / w ! !!! stat. hex. ! !! !! Ovid Amores 1.10 – lines 29 & 30 ! ! Tibullus 1.5 – lines 75 & 76 Catullus 64 was dramatically higher, 5.4 tib. eleg. ! ! ! ! verg. hex. ! ! nēscĭŏ quīd fūrtīvŭs ămōr părăt. ūtĕrĕ, qua͞esō, sōlă vĭrō mŭlĭēr spŏlĭīs ēxūltăt ădēmptīs, ! separated completely from the rest of ! ! dūm lĭcĕt : īn lĭquĭdā nāt tĭbĭ līntĕr ăquā. sōlă lŏcāt nōctēs, sōlă lĭcēndă vĕnīt, ! !! the Catullan corpus, and generally 5.2 ! ! ! Sneaky Love is up to something. Alone woman delights higher than samples from any author ! Enjoy it while you can, I beg: in either meter. in what she steals from a man, 5.0 your boat sails in clear waters. Alone she hires out her nights, alone she comes up for sale. 0 500 1000 1500 2000 Index The Functional n-gram Analysis Red: Elegiac Couplets Black: Dactylic Hexameters Observation: Sound plays a fundamental role in an author’s style, particularly for poets. Problem: a deficiency was the lack of a large data base of poets who wrote in both meters. Solution: split the elegiac corpus into two halves, a hexameter half and a pentameter half, cutting The functional n-gram is a feature for stylistic analysis, whereby the power of the Zipfian each couplet in two. distribution is realized by selecting the n-grams that occur most frequently as features, while preserving their relative probabilities as the actual feature element. A preliminary study considered: Catullus, Ovid, Propertius, and Tibullus. Feature: The Functional n-gram Samples of 150 randomly-chosen words. Features considered: the bi-gram frequency nt, the ratio um:am, and word length In this work, we consider primitive n-1 C(en-N+1en) n-1 sound elements as functional Results: all features are sensitive to the difference between hexameter and pentameter. While, as n-1 n-1 iff freq(en-N+1en) > φ character level bi-grams. P(en | en-N+1) = C(e ) expected, word length was greater for the hexameter half of the elegiac couplet than for the n-N+1 pentameter, it was still not as high as for stichic (continuous) hexameters. One model to explain this postulates blending of a genre-dependent signal with the meter signal. Latin Elegists considered Other Latin poets considered in this study: in this study: Catullus Horace References Ovid Juvenal Propertius Lucan 1. C. Forstall, S. Jacobson and W. Scheirer, “Evidence of Intertextuality: Investigating Paul the Tibullus Lucretius Deacon’s Angustae Vitae,” presented at Digital Humanities, July 2010. Statius 2. N. Coffee, J. Koenig, S. Poornima, C. Forstall and R. Ossewaarde, The Tesserae Project. http:// Vergil tesserae.caset.buffalo.edu 3. C. Forstall and W. Scheirer, “Features from Frequency: Authorship and Stylistic Analysis Using Functional n-grams for elegiac couplets: Repetitive Sound,” Chicago Colloquium on Digital Humanities and Computer Science, 2009. er – top bi-gram that is common to all poets considered 4. M. Platnauer, Latin Elegiac Verse: A Study of the Metrical Usages of Tibullus, Propertius & Ovid. nt – bi-gram with the greatest metrical variation Cambridge University Press, 1951. um – bi-gram sensitive to meter signal am – bi-gram sensitive to meter signal 5. G. Conte, Latin Literature: A History, Translated by J.B. Solodow, the Johns Hopkins University Press, 1999. .