
Lexis meets meter: attraction of lexical units in Russian verse Boris Orekhov1[0000-0002-9099-0436] 1 National Research University Higher School of Economics, Russia [email protected] Abstract. The presented work deals with the quantitative parameters of the statistically calculated attraction of words to a number of meters in Russian poetry. This attraction tendency can be detected by means of Fisher's exact test. At the level of large corpora, where the meters are not differentiated by the number of steps, we failed to detect any semantic causes of the attraction. As for the prosodic structure of tokens, it does not explain the majority of the cases for which the statistics suggest attraction between the words and meters. However, if we take the small subcorpora composed of lines of a particular meter differentiated by the number of steps, we are able to check whether the detected trends relate to the historical and literary reputation of a meter (for example, if this meter is often used for folkloristic styling, or for the theme of remembrance). Keywords: Verse studies, Meter, Poetry, Statistics. 1 Introduction Formal and quantitative approaches to the studies of poetry have a long history, particularly in Russian academic community. A poetic line can be efficiently described with numeric parameters, and mathematical methods of analyzing poetry have been used since the beginning of the 20th century. The Russian poet and a scientist Andrei Bely [Bely 1910] pioneered this research area in 1910 with his work on poetic symbolism. This methodology was further developed by B. Tomashevsky, G. Shengeli, the famous mathematician A. Kolmogorov and others. The final state of this methodology was reached in the works of M. Gasparov, who constructed the system of poetic terms and ideas and introduced linguistic methods into poetic studies. Thus, he laid the cornerstone of a new discipline, which is now called “linguistics of poetry”. As a matter of fact, combining traditional linguistics with poetics is a nontrivial phenomenon in the world science. This approach has been called “the Russian method” since J. Baley (see [Korchagin 2011: 90]). Therefore, it would be fair to say that the recent trends in metrical analysis that are currently being employed worldwide merely follow in the footsteps of the Russian researchers and the work done by them in the last hundred years. However, within the Russian body of research, there still remain less researched areas in linguistic poetics and metric studies. For example, the patterns of meter observed in Russian poetry are well-known [Gasparov 2000] (in addition [Smith 1985]); thorough analyses are available of poetic syntax and the morphological behavior of words in poetic lines [Gasparov & Skulacheva, 2004], as well as the 2 description of different grammatical categories (e.g. verbal aspect and tense, agreement in person, etc.) [Kovtunova, 2005]. Due to the active development of lexicology in Russian linguistics, a large number of frequency dictionaries were produced that describe the quantitative parameters of lexis used in poetry [Shestakova, 2011]. Moreover, semantic phenomena in poetry, such as metaphors, metonymy and periphrases, are also well described (though not quantitatively) [Grigorieva & Ivanova, 1985]. At the same time, there is hardly any research which statistically explains the interaction of linguistic and poetic phenomena, since such investigations require expertise in computational methods. 3 2 Data A corpus of poetic texts was recently compiled as a part of the Russian National Corpus project. This is a quite representative subcorpus, created by professional linguistics and literary scholars. More information on its creation can be found in [Korchagin 2015]. The texts in the corpus are manually annotated with specifically designed metric markup. The markup denotes the meter for every line and contains details about the number of stress foots found in the line and the clausula. The use of the corpus for the study of poetic language is advantageous in several ways. Firstly, it allows to analyze word distributions throughout a large collection, with works of different authors and from various epochs taken into account. Secondly, the corpus makes it possible to study lexical distributions not only from the perspective of time and authorship, but also in relation to different meters. We took a dump of this corpus for our research; the dump consisted of 9,693,341 word tokens and included a representative collection of Russian poetic texts dating from the 18th to the 20th (up to the 1930s) centuries. During this time period, Russian poetry mainly belonged to accentual-syllabic versification. This system of versification exploits the interchange of stressed and unstressed syllables as its main source of rhythmic organization. Thus, a poetic line can be described as a repetition of a number of steps, or syllabic groups, bound to the stressed syllable. There are five common combinations of syllabic groups, or meters – iamb, trochee, dactyl, amphibrach and anapest. The first two meters are called disyllabic because they consist of one stressed and one unstressed syllable. The three other meters are called trisyllabic, as they are made from one stressed and two unstressed syllables. In the beginning of the 20th century, new rhythmic organization principles emerged in the Russian poetry. The most popular versification system in that time period was dolnik, a non accentual-syllabic form. This poetic form, although present in the corpus, was not included in the data for our research. All in all, our data can be viewed as five independent subcorpora (we call it large corpora opposed to small corpora, where take into account the number of feets in each line), each one featuring only lines of a particular meter. These subcorpora are not of equal size: iamb is the most popular meter, comprising the largest part of the corpus, namely, 5,480,538 word tokens. The trochaic subcorpus consists of 1,593,554 word tokens, the dactylic of 402,434, the amphibrachic of 651,032, and the anapestic of 632,234 word tokens. Thus, the size of the syllabic-accentual part of the entire poetic corpus totals 8,759,792 word tokens, and we can notice that the majority of texts included into the poetic corpus exploit this versification system. 3 Goals and methodology We aim to find the connections between meters and lexical units as manifested in the Russian poetry in the course of its history. Obviously, such connections are not binary: we cannot say that a random word can be found only in lines of a particular 4 meter and will not be present in all other meters. Almost any word can be found in poetic lines of different meters. For example, the word огонь ‘flame, fire’ can be found in iamb: Свято`й ого`нь гори`т у вас в оча`х [F. N. Glinka. Греческие девицы к юношам (1821)] ‘Holy fire burns in your eyes’ and in trochee: Дожига`й после`дние оста`тки Жи`зни, бро`шенной в ого`нь! [N. A. Nekrasov. «Ничего! гони во все лопатки...» (1854)] ‘Burn the last remnants of the life thrown into fire!’ and in all trisyllabic meters: dactyl: В мо`ре не то`нет, в огне` не гори`т… [L. A. Mei. Оборотень (1858)] ‘In the sea does not sink, in the fire does not burn’ amphibrach: И со`лнце пыла`ло на не`бе огне`м [L. N. Trefolev. Маргаритка (1866-1889)] ‘And the sun blazed in the sky with fire’ anapest: В них ого`нь неземно`й Жа`рче со`лнца гори`т! [A. V. Kol'cov. Глаза (1835)] ‘Their ethereal fire burns hotter than the sun!’ As for the words with low frequency, the peculiarities of their functioning in texts cannot be revealed with quantitative approaches due to the insufficient amounts of data. The statistical methods we apply (see further) to our data cannot ensure reliable results on such amounts of word occurrences. When we discover connections between certain lemmas and meters, that does not mean that a lemma invariably belongs to a particular meter. We rather consider this regularity as the “attraction” of the lemma towards some meter. In other words, a lemma can occur in lines of different meters, but still it demonstrates a clear preference for a certain meter. Such tasks as finding connections between lemmas and meters, or a more general task of exploring relationships between several variables can be solved with Fisher’s exact test [Fisher 1922]. This test is successfully applied in linguistic research, namely, in collostructional analysis [Stefanowitsch & Gries 2003] which explores a similar phenomenon – the degree of connectivity between words, on the one hand and constructions, on the other. As in the case of meters, slots in constructions can be filled in with various words; however, Fisher’s test shows that the word distribution is not random, and we can say that some words tend to be attracted to certain constructions (for example, the verb сказать ‘to say/tell’ is attracted to the past tense [Rakhilina 2010: 37]). In the present research we will apply Fisher’s exact test to our task. In particular, we aim to reveal statistically significant connections between lemmas and meters as well as between certain tokens and meters. The connections are discovered on the data from the large metric subcorpora (iambic, trochaic, dactylic, amphibrachic and anapestic, see “Data” above) and from smaller corpora of particular meter varieties (for example, the iamb with four steps = the iambic tetrameter, the iamb with five steps, etc.). Exploring meter varieties may be of interest to historians 5 of Russian poetry and for philological research in general, because it is widely known that meters with different number of steps have different functions and belong to different genres. At the same time, the proposed approach is not entirely accurate from the point of view of statistics, as smaller corpora do not yield reliable results.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-