VALÉRIE BEAUDOUIN CREDOC and EHESS, FR
Total Page:16
File Type:pdf, Size:1020Kb
Beaudouin, Valérie & Yvon, François (1996). "The Metrometer : a Tool for Analysing French Verse", Literary & Linguistic Computing vol. 11, n°1, 23-32. The Metrometer : a Tool for Analysing French Verse VALÉRIE BEAUDOUIN CREDOC and EHESS, FR. FRANÇOIS YVON TELECOM PARIS, FR. tool capable of testing various hypotheses and theories Abstract of rhythm with quantitative methods. In this article, we present the "metrometer", a The metrometer takes any kind of text as an input, computing tool, capable of identifying, in any kind of and produces a phonemic transcription as an output, French text input, metrical components, that is together with some additional information about syllables. The choice of the syllable is very natural: syntactic components, etc. This transcription is French poetry is known to be mainly syllabic. More consistent with the rules of classical meter, and does specifically, our software performs a complete not aim at mimicking what actors may or may not say transcription of the input into the corresponding on stage. This transcription allows the identification in sequence of phonemes, using a transcribing module a given verse (or text) of the syllabic components, consistent with the specific phonology of classical which are the elementary units of French poetry. poetry, which allows the marking and counting of Various existing computing tools (a lexicon, a metrical syllables. Our system outputs as well with a syntactic analyzer, and a grapheme to phoneme series of useful markers (syntactic tags, word converter) have been extended to comply with the boundaries, etc.). This tool has been developed and numerous particularities of poetic reading. We will tested on a corpus containing all P. Corneille’s and J. shortly present these tools, before detailing how they Racine’s plays, totaling about 80 000 verses, and has were adapted to fulfill our requirements. The proved to be almost error free on these data. The modifications were mainly due to three major metrometer shall provide a very effective tool to tackle differences between syntax and phonology of verse the study of rhythm in French poetry with quantitative and prose: diaeresis, mute-e elision rules, and liaison. methods . Finally, we will give some preliminary results regarding the verse structure. 1 Introduction The computer tool, the metrometer, presented 2 System overview below has been developed with the specific purpose of While working on phonemic transcription exploring, on large verse corpora, the various surface problems, the need for a complete and accurate aspects under which rhythmic phenomena appear. We parsing of the input text progressively emerged. Such have primarily limited our scope to a very restricted an analysis is especially useful to disambiguate and rigidly codified form of verse, namely the heterophon homographs, and to address in detail the classical (XVIIth century) alexandrine. More problem of liaison in the verse. At the current stage of precisely, this tool has been developed and tested on a our work, a simple part-of-speech tagger would have huge sample of such verses, the complete theater work been enough to solve most of our difficulties; it is of P. Corneille, and J. Racine1. The total corpora size however obvious that a full syntactic analysis would is about 80 000 verses, which represent 700 000 eventually be a necessary step for any serious analysis words. The reasons for this restricted poetic form are of rhythm. This explains why our transcription module two fold: first, the classical alexandrine is strongly corresponds to an independent layer on top of the constrained, and thus allows systematic and automated syntactic analysis system developed at TELECOM treatment. Second, we expect our description of this PARIS by P. Constant (1991). typical verse to appear as a reference for diachronic We shall present the design of this module, before studies of French verse. explaining how text strings are converted into Our work is directly inspired by previous analyses phoneme sequences. of rhythm, and their application within the framework 2.1 Syntactic analysis of verse proposed by J. Roubaud (1978, 1986 and 1991) and P. Lusson (1973), and aims at providing a 2 P. Constant's syntactic analyser radically differs enriched by two additional data. The first one is from most known parsing systems of French. It relies strictly lexical and flags words that are subject to upon the strong assumption that syntactical ambiguity diaeresis. The second one is the result of an analysis is a major property of natural languages, and that, of the syntactic context to assess the degree of consequently, ambiguities should not be regarded and connection between the current word and its successor treated as accidental events. His system therefore (should there be a liaison, a break, etc.?). offers an effective and structural way to handle The second step consists in sequentially applying a ambiguities. series of phonological derivations, leading to a final The second unusual aspect of his model is that the representation of the input, which can be viewed as a concept of a "well-formed" sentence, a central element kind of (quite abstract) surface form. Our system of the theory of formal grammars is not relevant. On knows several such sets of phonological rules, and can the contrary, the system is able to produce a parse for produce several surface forms, corresponding to any kind of sentence, even incorrect ones. In this different styles of speech (formal, informal, etc.). respect, it is robust to local orthographic These rules mainly follow the description of the "catastrophes", such as in 'Un petit chats'2 (a small French phonological system given in Dell (1985). A cats). This proved to be for us a valuable feature, for detailed presentation of this transcription system is one main reason: our maximal analysis unit being the given in Yvon (1993). verse, we needed a system capable of processing such Working out a simple example should clarify the units, which do not always correspond to complete roles of the various components of the transcription sentences. There is of course another good point in module. Let's consider the processing of the following using such a system: orthographic typos, also present input: "le chien mange" (the dog is eating), sketched in our corpus can safely be ignored. on table 1. The analysis proceeds by applying sequentially a series of heuristics, each heuristic or group of Rule New representation heuristics functioning as an independent layer. The Rewrite rules first layer concerns text segmentation and lexical ("le", article) -> l\ l\ |6 chien mange access3, leading to a situation of maximal ambiguity, ("chien", noun) -> ßi´n l\ | ßi´n || mange where each token is assigned all its possible syntactic ("mange", verb) -> | || tags. Upper layers will progressively reduce the måné\ l\ ßi´n måné\ number of interpretations of the input sentence, by Phonological rules application of very specific and local rules. For semi-vocalization of high l\ | ßj´n || måné\ instance, a group of such heuristics (a layer) is vowels responsible for solving noun/verb ambiguities, another nasalization + nasal l\ | ßj´~ || må~~é\ one is dedicated to the structuration of verb groups, consonant delation etc. The reduction of the number of possible parses deletion of final schwa l\ | ßj´~ || må~~é goes together with the refinement of the syntactic structuration of the input. This analyser is still Table 1 : grapheme to phoneme conversion evolving, and is currently made up of about forty The last step of treatment is quite straightforward, layers. and consists in identifying the syllables in the output Such a strategy of analysis also offers the phoneme stream. possibility to develop new layers. Besides the Adapting our existing transcription system to transcription module, which is one such layer, we metrometrics mainly involved the modification of this seriously consider the development of a layer phonological module, essentially the development of a specifically devoted to the recognition and analysis of few verse specific phonological rules, which will be the "metaposition"4 (Ronat, 1975), which appears to examined in the next section; the rest of the system be a major stylistic figure of classical verse. remained almost unchanged. 5 It shall finally be noted that automating the 2.2 Grapheme to phoneme conversion grapheme to phoneme transcription of texts implies a The grapheme to phoneme transcription system we certain number of arbitrary choices, from the used follows a two-step procedure. Each pair (word, identification of a valid phonetic alphabet, to the tag) is first transcribed in isolation into an abstract selection of a "correct" set of phonological phonemic representation, very similar in its spirit to a derivations, and of the detailed identification of their undelying form. This transcription is achieved by the respective scope. Our transcriptions shall be therefore application of an ordered set of rewriting rules. It is regarded as abstract phonemic transcriptions, 3 produced for the sole purpose of counts. Neither instance, the word “diamant” (diamond) should be allophonic variations, nor prosodic marks (length, transcribed as / d j a m å¤ / to reflect the standard strength, height) have been elicited, which distinguish pronunciation, and as / d i a m å¤/ to be consistent our system's output from what a "true" speech with verse reading. In the first case, the group “ia” synthesis system would have produced. will count for one metrical position, and for two in the The following figure summarizes the complete second. processing of the first verse from "La Thébaïde" by If we now consider global sentence pronunciation, Racine. most differences between the two styles of readings are easily expressed in the general framework given Ils sont sortis , Olympe ? ah mortelles douleurs! by Milner and Regnault (1987). As we shall see, their assumption is that the whole verse, in poetic reading, Lexicon : should be given the same role as the phrase in words and compound nouns + standard diction, i.e.