<<

SHORT TERM SCIENTIFIC MISSION (STSM) SCIENTIFIC REPORT

This report is submitted for approval by STSM applicant to the STSM coordinator

Action number: CA 16214: Distant Reading for European Literary History STSM title: Mixed Novel Genres in 19th-Century Romanian Fiction: “city mysteries” embedded in “Hajduk” settings STSM start and end date: 01/03/2019 to 20/03/2019 Grantee name: Roxana PATRAS

PURPOSE OF THE STSM: The STSM aimed to contribute to the scientific objectives of the CA16204 through the following contributions: . refining the list of Romanian novels according to ELTeC criteria and to the current status of resources available in the Romanian libraries; b. testing computational methods of literary text analysis on a lesser resourced language; . proceeding with the digitization of Romanian novels with a focus on city-mysteries.

The purpose of the STSM was to track down — through complex computational methods — some structural and/or stylistic interactions between two varieties of historical novels (presumably “sub-genres”) that have become increasingly distinct throughout the Romanian literary tradition: the Hajduk novels as an original, locally-inspired novel cluster (H) and the City-Mysteries as a variety of novels predominantly consisting of French imitations (CM). Considering the difficulties underlying the clean-up and normalization of nineteenth- century Romanian texts, the objectives of this research stay have been re-devised as follows:

1. Familiarizing oneself with Transkribus and training an HTR model on 5 difficult texts printed before 1865 in the Romanian so-called “transition alphabet” (Cyrillic+ characters); 2. Learning to wield “Stylometry with R” (Stylo) and checking its performance on nineteenth-century Romanian texts; 3. Validating/Invalidating theoretical assumptions concerning the generic homogeneity of the 2 clusters through computational methods of literary text analysis.

DESCRIPTION OF WORK CARRIED OUT DURING THE STSMS The research activities have been organised on 3 packages according to the STSM work-plan:

a. Designing effective strategies for the automatic clean-up of historical varieties of Romanian a1. introduction to Transkribus functions; uploading of texts under a collection entitled “Alfabet Tranzitie” a2. setup of transcription principles: turning Cyrillic into Latin letters, preserving Latin letters, keeping punctuation as close as possible to the punctuation on the printed page. a3. designing a transliteration standard for all situations where Cyrillic correspond to Latin letters (Annex 1) a4. customization of Latin characters that are not present in current Romanian (Annex 1). a5. choice of 5 texts, sampling varieties of prints from different time-slots; choice of 51 sample pages a6. transcription of 51 pages. a7. contact Transkribus team and ask for support in training a model. a8. devising methods of converting nineteenth-century glyphs “ș/ Ș” and “ț/ Ș” into standard UTF-8.

b.Checking the performance of tools such as Stylo R on Romanian texts from the two clusters (31 H and 41 CM) and an experimental set of 70 contemporary Romanian novels b1. checking UTF encoding of files with EncodeAnt. b2. renaming files according to guidelines in the package for stylometric analyses.

b3. preparing corpora, primary sets and secondary sets for various types of analyses b4. clustering/ bootstrap consensus tree/ PCA of H corpus. - clustering/ bootstrap consensus tree/ PCA of short H novels (under 200k) - N.D.Popescu vs Imitators (. Ighel, A. Marcu, P. Popescu, T.M. Stoenescu, . Stoenescu) - Anonymous H author vs Famous H authors - Female H authors (B. Dumbrava) vs Male H authors - Novels sharing the same H hero (Iancu Jianu in Bucura Dumbrava’, N.D. Popescu’, A. Marcu’s novels) - H novels published by the same publishing house (Steinberg, Lazar, Dor. P. Cucu) - Case of presumptive siblings or pen-name: T.M. Stoenescu — Stefan Stoenescu (see also Thoma I. Stoenescu, the editor of Catastihul amorului). b5. clustering/ bootstrap consensus tree/ PCA of MC corpus. Results: - parody of CM novels (Catastihul amorului) vs. regular CM. - stability of 1860 writing style in the cluster formed from D. Bolintineanu, I. Bujoreanu, R. Ionescu -/+ Gr. Grandea - stability of the cluster formed by uncertain-paternity item (Catastihul amorului, presumably written by R. Ionescu) and Radu Ionescu’s Don Juanii de Bucuresti. - female CM authors (Smara) vs Male CM authors - authors of “sensational” H novels and CM novels c. Scientific networking and opportunities for project follow-ups c1. participation in the weekly lab meetings of the computational linguistics group (supervised by prof. Mike Kestemont) c2. exploratory discussions with respect to future projects and collaborations between Antwerp Center for Digital Humanities and Literary Criticism (ACDC), University of Antwerp and the Institute for Interdisciplinary Research, “Alexandru Ioan Cuza” University of Iasi.

DESCRIPTION OF THE MAIN RESULTS OBTAINED The main results of this research stay are: 1. creation of a digitized sub-corpus of 41 city mysteries, some of which will included into ELTeC; 2. training of the HTR model on the Romanian transition alphabet on Transkribus; 3. transfer of practices and methodologies devised for computational analysis of literary texts (Stylo); 4. drafting a paper on transliteration and transcription principles for the Romanian transition alphabet; 5. drafting a paper on the N.D. Popescu and imitators of Hajduk style; 6. drafting a paper on the attribution of Catastihul amorului to Radu Ionescu; 6. organising results on two folders results_haiduci and results_mystery.

Even though they still need to be refined, results indicate that H and CM clusters preserve a relative autonomy in spite of their common sensational elements, thus they can be defined as “sub-genres/ novel species”. This may lead to subsequent hypotheses such as: - H cluster looks more homogeneous because authors are prone to imitation and even mannerism; - CM cluster looks heterogeneous (with a only one relatively unchanged group – the novels written around 1860), thus it might yield better results if topical modelling was applied; - Hajduk heroes do not mix with city mysteries criminals, thus each cluster (genre) might formalize a different approach on crime; - Bucura Dumbrava’s collaboration to the Romanian translation of Haiducul (Teodor Nica is the official translator of the German original Der Haiduk) might be demonstrated through the book’s relative mobility in the genre’s visualisations (female vs. male writing); - The case of the Radu Ionescu’s uncertain authorship of Catastihul amorului can be finally sorted out: in 1986, the literary historian and editor D. Balaet launches the hypothesis that R. Ionescu might be the author, yet he did not haste into attributing the text. - In the case of the anonymous H novel (signed with the initials “MCP”), the authorship attribution still remains problematic as the H corpus seems not to contain works by the same author.

FUTURE COLLABORATIONS (if applicable) The STSM grantee and the supervisor of the scientific mission will collaborate further in CA16204, as well as in other projects that imply digitization of cultural heritage and the involvement of peripheral cultures in broader European initiatives.

2

Annex 1

Texts in the Romanian Transition Alphabet (1830-1862) Historical Context and Transcription Principles

 The Romanian transition alphabet is a combination of Cyrillic and Latin characters that was used for printing between approx. 1830, after the Phanariote rulers and the establishment of the Russian protectorate (The Organic Regulation), and approx. 1862, when the first king of The Romanian United Principalities (Moldova and Wallachia) passed a law that constrained typographers to use only Latin glyphs. Blending in a way that make Romanian printed texts — press, legal documents, and fiction — resemble to the avant-garde Dada poetry, the Cyrillic and Latin letters are not distributed evenly across the 3 decades of transition. Set up through the typographers’ habits and practices and not through convention or linguistic standards, the rules of using Latin instead of Cyrillic are somehow discrete and totally dependent on the location of printing houses. Compared to the Wallachian ones (Bucharest), the Moldavian prints (Iasi) present certain dialectal particularities. Moreover, if take samples randomly, we can easily notice that the amount of Cyrillic letters is slowly decreasing. However, for some consonants, chiefly palatals, both Cyrillic and Latin letters might be used simultaneously in one and the same text: K/ k — Ч/ч (); G/g — Г/ г (ghe) — Џ/ џ (); S/s — C/c ().

 In order to train an HTR model for these texts, I have chosen 5 samples that show, before and after 1859, when the 2 Romanian provinces become a country with an official language, the progression from a massive use of Cyrillic letters to an eye-friendly employment, which makes reading more fluent. The paratextual information characterizes the 5 texts as “original” or “historical” novels. In fact, if we take into account that the first Romanian novel was published in 1845 by a mysterious author signing D.F.B. (Elvira sau amorul fără de sfârşit. Romans original/ Elvira or the neverending love. Original Romance), they are regarded by the Romanian literary tradition as a sort of founding pieces:

1. PELIMON Al., Hoţii şi Hagiul. Roman istoric, Buc., Tip. Sfintei Mitropolii, 1853, 117 p. 2. BOERESCU Costache, Aldo şi Aminta sau Bandiţii, Buc., Tip. Bisericească din Sf. Mitropolie, 1855, 164 p. 3. PELIMON Al., Jidovul cămătar. Moldova şi Bucovina, Buc., Tip. Stephan Rassidescu, 1863, 292 p. 4. PELIMON Al., Bucur, istoria fundării Bucureştilor, Buc., Tip. Nationala Iosif Romanov, 1858, 251 p. 5. ARICESCU C.D., Misterele căsătoriei, I. Bărbatul predestinat, Buc., Tip. Stephan Rassidescu, 1861, 179 p.

 As a general rule, Latin capital letters are prefered for writing titles after 1859. The Latin letters Z/ z, M/ m, D/ d, S/ s, T/ t, N/ n, A/ a, I/ i, / e, / o, Î/ î, / u, Ŭ/ ŭ, Ĭ/ ĭ are present from the oldest sampled text (1853), whereas the Cyrillic Х/х (ha), Ш/ ш (), Щ/ щ (), Ц/ ц (tze), Џ/ џ (dze), Ч/ ч (che), Ъ/ ъ (ă), П/ п (), Р/ р (), Ж/ ж (), Ф/ф (), К/ к (ca), В/ в (), Л/ л (), Г/ г (ghe), Б/ б (be). Among these Cyrillic letters, the first to receive a Latin equivalent are: Ф/ф (ef) → f; Г/ г (ghe) → g; Л/ л (el) → l; Ж/ ж (zhe) → j. At the same time, Р/ р (er), П/ п (pe), Ъ/ ъ (ă), Ч/ ч (che), В/ в (ve), Ш/ ш (sha), Щ/ щ (shcha), Ц/ ц () tend to be maintained until 1862, when some of them they are replaced with glyphs such as “ḑ” (dz), “ş” (sh) and “ț” (tz), which were imported from the Livonian alphabet but have entered the printing circuit only after 1865.

 The general guidelines for transcription have been established as follows:

1. Creation of the collection “ALFABET DE TRANZITIE” containing 6 items.

2. Random transcription of initial, middle, and end pages: 2.1. PELIMON Al., Hoţii şi Hagiul. Roman istoric (1853): pages 6, 7, 8, 52, 53, 54, 76, 77, 78, 79, 80. 2.2. BOERESCU Costache, Aldo şi Aminta sau Bandiţii (1855): pages 8, 9, 10, 36, 37, 38, 114, 115, 116, 148, 150. 2.3. PELIMON Al., Jidovul cămătar. Moldova şi Bucovina (1863): pages 5, 6, 7, 8, 49, 50, 51, 100, 101, 102. 2.4. PELIMON Al., Bucur, istoria fundării Bucureştilor (1858): pages 5, 6, 7, 84, 85, 86, 87. 3

2.5. ARICESCU C.D., Misterele căsătoriei, I. Bărbatul predestinat (1861): pages 1, 2, 56, 57, 58, 97, 98, 99, 133, 135, 136.

3. Transliteration one-on-one of all Cyrillic letters excepting the situations when K/k stands for the group Ch/ ch (e.g. Бukete → Bukete): Х/х → H/ h; Ш/ ш → Ș/ ș; Щ/ щ → Șt/ șt; Ц/ ц → Ț/ ț, Ч/ ч → C/ c; Ъ/ ъ → Ă/ ă; П/ п → P/ p; C/c → S/s; Р/ р → R/ r; Ж/ ж → J/j; Ф/ф → F/ f; К/ к → C/c; В/ в → V/ v; Л/ л → L/l; Г/ г → G/ g; Б/ б → B/ b; Џ/ џ → G/ g.

4. Customization of the following glyphs: apostrophe, right double quotation mark, double low-9 quotation mark, Ŭ/ ŭ, Ĭ/ ĭ, á.

 Further use: This trained model can help to automatize the transliteration task that it is known as time-consuming and that has been done so far only manually. Taking into account that the digitization of Romanian nineteenth-century resources started from press, in the second stage of this research, I aim to test this model on already available non-fictional texts, such as legal documents and press articles, a part of whom are already available in a digitized format on dacoromanica (http://www.digibuc.ro/colectii/publicatii-periodice-c1574) and transsilvanica (http://documente.bcucluj.ro/), scoala ardeleana (http://documente.bcucluj.ro/scoala/).

4