<<

Variation and Text Type in Texts

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Christin Michelle Laroche Wilson, M. A.

Graduate Program in

The Ohio State University

2012

Dissertation Committee:

Brian , Adviser

Daniel Collins

Hope Dawson

Dieter Wanner

Copyright by

Christin Michelle Laroche Wilson

2012

Abstract

Although there is a fairly large corpus of Old Occitan texts, the majority of the linguistic analysis on the has been done using only one type of text: the lyric of the , though the accounts for only about ten percent of the total Old Occitan corpus. Our understanding of the language and its development is thus less complete and accurate than it could be if all of the types of texts that constitute the full corpus were considered more fully. This study seeks to bridge that gap by considering whether analyzing the understudied and non-lyric poetry texts uncovers the same variants and patterns as the lyric poetry, or to what extent these vary between text types. The publication of the Concordance de l’Occitan Médiéval (COM), which includes the entirety of the Old Occitan corpus for the first time, allows the prose and non-lyric poetry texts to be searched and analyzed digitally. Very little quantitative work has been done concerning the patterns of within the Old Occitan texts, but the creation of the COM makes such studies possible. Using this corpus and taking previous research on the language as a starting point (e.g. Jensen 1976, Anglade 1977), this study compares the attestations and patterns of the use of phonological and morphological features between the three major types of text in Old Occitan: the lyric poetry texts, the non-lyric poetry texts, and the prose texts. By considering these features both quantitatively and qualitatively, I seek to further understand the relationship between ii

variation and text type, particularly in reference to the representation of sound change in progress.

Three aspects of the Old are investigated: the use of analytic and synthetic comparative adjective forms, the formation of adjectives using various derivational suffixes, and the development of the glide-initial . My findings show that the text type plays an important role in the patterns of variation found within the texts; the patterns of variation of all three features investigated were significantly different in each type of text. For example, the synthetic comparative formation of some adjectives is far more common in the lyric poetry than in either the non-lyric poetry or in the prose. Similarly, the variants of the front mid vowels are significantly more common in the lyric poetry than in the other types of texts. The type of text, however, is not the only feature to influence the use of these aspects of the language. Instead, text type differences interact with differences based on the date of the text, the or geographic location of the text, and other parameters to create a complex web of associations and tendencies to which Old Occitan writers were sensitive.

iii

Dedication

In memory of my grandmother, Anna

iv

Acknowledgements

I owe a great debt of gratitude to many people for their guidance and support in the course of my doctoral studies and, more specifically, in the course of this project.

First, I would like to thank my advisor, Brian Joseph. I have continuously benefitted from his patience, his guidance, his generosity, and his insight. He has always been available and willing, no matter the or the location, to discuss the big ideas, the fussy little details, and anything in between. I would also like to thank my committee, Dieter

Wanner, Dan Collins, and Hope Dawson, for their feedback and encouragement during this study. I would especially like to thank Hope for her encouragement and her insightful and detailed feedback not only on this dissertation but also on my teaching; I have greatly benefitted from her generosity and attention to detail.

I also owe Peter Ricketts a great debt; I would like to thank him for his generosity, his time, his knowledge, and his willingness to share his library and the texts of the unpublished COM3 with me. Without his assistance, this study would not have been possible. I would also like to thank Massimiliano de Conca for sharing his work and the side-by-side manuscript variants with me.

v

In addition, I would like to thank the OSU Linguistics Department, which has provided a rich environment in which I have grown as a linguist, a scholar, and a teacher.

I am deeply thankful for the many linguists at OSU who have taught and encouraged me, for my fellow graduate students with whom I have had wonderful conversations and among whom I have made some very dear friends, and for my students, from whom I have learned, as well.

On a more personal level, I would like to thank my family and friends, who have been a diverse but constant source of support over the years. Their prayers, conversations, celebrations, and laughter have helped me in countless ways on countless occasions.

Specifically, I want to thank my mother and Lacey, for their unwavering support and encouragement even when they didn’t understand what I was going through or what I was talking about. Finally, I want to thank my husband, Jon, for everything.

vi

Vita

2006…………………………………………………B.A., University Scholars, Baylor University

2006…………………………………………………Graduate School Fellowship, The Ohio State University

2010…………………………………………………M.A., Linguistics, The Ohio State University

2007 to present………………………………………Graduate Teaching Associate, Department of Linguistics, The Ohio State University

Publications

Mihalicek, V. & Wilson, C. (Eds.). 2011. Language files: materials for an introduction to language and linguistics. 11th ed. Columbus, OH: The Ohio State University Press.

Fields of Study

Major Field: Linguistics

vii

Table of Contents

Abstract ...... ii

Dedication ...... iv

Acknowledgements ...... v

Vita ...... vii

Table of Contents ...... viii

List of Tables ...... xii

List of Figures ...... xv

Chapter One: Introduction ...... 1

1.1 Introduction ...... 1

1.2 Focus of Previous Research on Old Occitan ...... 3

1.3 The Old Occitan Corpus ...... 4

1.3.1 The Manuscript Tradition ...... 5

1.3.2 Previous Editions and Availability ...... 6

1.3.3 The Concordance de l’Occitan Médiéval ...... 7

1.4 Text Types and Linguistic Analysis ...... 9

1.5 Text Types in Old Occitan ...... 11

1.5.1 Lyric Poetry ...... 12

1.5.1.1 Genre in Old Occitan ...... 14

viii

1.5.2 Non-lyric Poetry...... 15

1.5.3 Prose ...... 17

1.6 Why Study the Prose Texts? ...... 18

1.7 Other Methods of Classifying Old Occitan Texts ...... 21

1.7.1 Date ...... 22

1.7.2 Dialect ...... 28

1.8 Challenges in Studying a Historically Attested Language ...... 37

1.8.1 Concerns when Studying Written Texts ...... 39

1.8.2 Authenticity...... 42

1.9 Variation in Old Occitan Texts ...... 45

1.10 Dissertation Overview ...... 47

Chapter Two: Comparative Adjectives ...... 49

2.0 Comparative Adjectives ...... 49

2.1 Comparative Adjectives in Grammars and Descriptions of Old Occitan ...... 49

2.2 Comparative Adjectives in the Concordance de l’Occitan Medieval ...... 56

2.2.1 Methodology ...... 56

2.2.2 Results ...... 59

2.2.3 Doubly Marked Comparatives in the COM ...... 65

2.3 Analysis of Comparative Adjectives by Dialect of Text...... 68

2.4 Analysis of Comparative Adjectives by Date of Text...... 71

2.5 Analysis of Comparative Adjectives by Text Type ...... 76

2.6 Conclusion ...... 88

Chapter Three: Adjective Derivation ...... 91

3.0 Adjective Derivation ...... 91 ix

3.1 Adjective Derivation in Grammars and Descriptions of Old Occitan ...... 91

3.1.1 Derivational Suffixes to be Analyzed ...... 92

3.1.1.1 -al ...... 93

3.1.1.2 -art...... 94

3.1.1.3 -enc ...... 96

3.1.1.4 -esc ...... 97

3.1.1.5 -ivol ...... 99

3.2 Name Creation in Old Occitan ...... 100

3.3 Derived Adjectives in the Concordance de l’Occitan Médiéval...... 102

3.3.1 Methodology ...... 102

3.3.2 Results ...... 107

3.3.2.1 Adjectives ...... 109

3.3.2.2 Names ...... 111

3.3.2.3 -al Revisited ...... 114

3.4 Analysis of Derived Adjectives by Dialect of Text ...... 116

3.5 Analysis of Derived Adjectives by Date of Text ...... 129

3.6 Analysis of Derived Adjectives by Text Type ...... 143

3.7 Conclusion ...... 151

Chapter Four: Glide-Initial Diphthongs ...... 153

4.0 Glide-Initial Diphthongs ...... 153

4.1 Glide-Initial Diphthongs in Grammars and Descriptions of Old Occitan ...... 154

4.2 The Multiple Manuscript Problem ...... 162

4.3 Glide-Initial Diphthongs in the Concordance de l’Occitan Médiéval ...... 168

4.3.1 Methodology ...... 168 x

4.3.2 Results ...... 173

4.3.3 [wo] and [we] ...... 174

4.4 Analysis of Glide-Initial Diphthongs by Dialect of Text ...... 175

4.5 Analysis of Glide-Initial Diphthongs by Date of Text ...... 185

4.6 Analysis of Glide-Initial Diphthongs by Text Type...... 192

4.7 Conclusion ...... 202

Chapter 5: Conclusion...... 205

5.1 Summary and Discussion ...... 205

5.2 The Waldensian dialect ...... 209

5.3 The Composition of the Lyric Poetry ...... 211

5.4 The Contribution of the Prose Texts ...... 218

5.5 Variation and Grammar ...... 220

5.6 Conclusion ...... 223

References ...... 225

Appendix A: Index of Texts used in Examples ...... 232

Lyric Poetry Texts ...... 232

Non-Lyric Poetry Texts ...... 233

Prose Texts ...... 234

xi

List of Tables

Table 1.1 Number of Texts and Words by Date…………………………………………25

Table 1.2 Number of Non-Lyric Poetry Texts and Words by Date……….……………..26

Table 1.3 Number of Prose Texts and Words by Date………………………………...... 27

Table 1.4 Number of Texts and Words by Dialect………………………………………33

Table 1.5 Number of Non-Lyric Poetry Texts and Words by Dialect…………………...34

Table 1.6 Number of Prose Texts and Words by Dialect……………………...………...35

Table 2.1 Positive and Synthetic Comparative Adjectives (based on Grandgent 1905)...53

Table 2.2 Comparative and Synthetic Adjectives in the Entire COM Corpus…………..61

Table 2.3 Synthetic Comparative Adjective Forms by Dialect of Text.………………....69

Table 2.4 Synthetic Comparative Adjective Forms by Date of Text.……………………73

Table 2.5 Synthetic Comparative Adjective Forms by Text Type………………………78

Table 2.6 Synthetic Comparative Adjective Forms by Text Type with Mention Attestations Removed…...………………………………………..………………….82

Table 3.1 Derived Adjectives in the Entire COM Corpus………………….…………..108

Table 3.2 Adjectives in the Entire COM Corpus…………………………………….....110

Table 3.3 Names in the Entire COM Corpus…………………………………………...111

Table 3.4 Total Derived Lemmas by Dialect of Text………………………..…………117

Table 3.5 Total Derived Tokens by Dialect of Text……………………………………119

Table 3.6 Adjective Lemmas and Tokens by Dialect of Text………………………….122 xii

Table 3.7 Tokens of Roots Which are Derived with Multiple Suffixes Found with Each Suffix in Each Dialect…………………………...………………………………….125

Table 3.8 Name Lemmas and Tokens by Dialect of Text……………………………...126

Table 3.9 Total Derived Lemmas by Date of Text……………………………………..130

Table 3.10 Total Derived Tokens by Date of Text…………………………………..…134

Table 3.11 Adjective Lemmas and Tokens by Date of Text…………………………...136

Table 3.12 Name Lemmas and Tokens by Date of Text……………………………….140

Table 3.13 Total Derived Lemmas and Tokens by Text Type………………………....143

Table 3.14 Adjective Lemmas and Tokens by Text Type……………………………...147

Table 3.15 Name Lemmas and Tokens by Text Type……………………………….....149

Table 4.1 Total Diphthong and Monophthong Forms in the Entire COM Corpus……..174

Table 4.2 Front and Back Diphthongs by Dialect of Text……………………….……..176

Table 4.3 Front Diphthongs by Dialect of Text ………………………………...……...177

Table 4.4 Back Diphthongs by Dialect of Text………………………………………...181

Table 4.5 Number and Percentage of [wo] and [we] Variants by Dialect of Text……..184

Table 4.6 Front and Back Diphthongs by Date of Text………………………………...186

Table 4.7 Front Diphthongs by Date of Text………………………………………...…187

Table 4.8 Back Diphthongs by Date of Text…………………………………………...189

Table 4.9 Number and Percentage of [wo] and [we] Variants by Date of Text………..191

Table 4.10 Front and Back Diphthongs by Text Type……………………………….....192

Table 4.11 Front Diphthongs by Text Type…………………………………………….195

Table 4.12 Front Diphthongs by Text Type excluding Waldensian Prose Texts………196

Table 4.13 Back Diphthongs by Text Type…………………………………………….199

Table 4.14 Number and Percentage of [wo] and [we] Variants by Text Type ……...…201

xiii

Table 5.1 Adjective Lemmas and Tokens by Dialect of Text Compared with the Lyric Poetry…………………………………………………………..…………….212

Table 5.2 Name Lemmas and Tokens by Dialect of Text Compared with the Lyric Poetry...... 213

Table 5.3 Glide-Initial Diphthongs by Dialect of Text Compared with Lyric Poetry….214

Table 5.4 Adjective Lemmas and Tokens by Date of Text Compared with Lyric Poetry……………………………………………………………………………….215

Table 5.5 Glide-Initial Diphthongs by Dialect of Text Compared with Lyric Poetry….217

xiv

List of Figures

Figure 1.1 Map of Approximate (Old) Occitan Dialect Areas…………………..………29

Figure 3.1 Percentages of Lemmas and Tokens with Each Suffix in the Entire COM Corpus………………………………………………………………………….…...108

Figure 3.2 Percentages of Adjective Lemmas and Tokens with Each Suffix in the Entire COM Corpus………...…………………………………………………...….110

Figure 3.3 Percentages of Name Lemmas and Tokens with Each Suffix in the Entire COM Corpus………..……………………………………………………………....112

Figure 3.4 Percentages of Lemmas and Tokens with Each Suffix by Dialect of Text....120

Figure 3.5 Percentages of Adjective Lemmas and Tokens with Each Suffix by Dialect of Text………………………………………………………………………………123

Figure 3.6 Percentages of Name Lemmas and Tokens with Each Suffix by Dialect of Text………………………………………………………………………………....127

Figure 3.7 Percentages of Lemmas Derived with Each Suffix by Date of Text……..…130

Figure 3.8 Percentages of Tokens Derived with Each by Date of Text………...……....134

Figure 3.9 Percentages of Adjective Lemmas and Tokens with Each Suffix by Date of Text………………………………………………………………………………137

Figure 3.10 Percentages of Name Lemmas and Tokens with Each Suffix by Date of Text…………………………………………………………………………………141

Figure 3.11 Percentages of Total Derived Lemmas and Tokens with Each Suffix by Text Type………………………………………………………….………………..144

Figure 3.12 Percentages of Adjective Lemmas and Tokens with Each Suffix by Text Type………………………………………………………………………………..147

xv

Figure 3.13 Percentages of Name Lemmas and Tokens with Each Suffix by Text Type...... 149

Figure 4.1 Cobla VI of Si·m fos Amors de joi donar tant larja (de Conca 2008, p.205‒6)………………………………………………………………………….…165

Figure 4.2 Percentages of Front Diphthong Variants by Dialect of Text..……………..177

Figure 4.3 Percentages of Back Diphthong Variants by Dialect of Text…………..…...181

Figure 4.4 Percentages of Front and Back Diphthongs by Date of Text………..……...190

xvi

Chapter One: Introduction

Pauc val chans que dal cor non ve (PC 70 32)

A song which does not come from the heart is worth little.

1.1 Introduction

Although there is a large corpus of Old Occitan texts, the majority of the linguistic analysis on the language has been done using only one type of text: the lyric poetry of the troubadours. Our understanding of the language and its development is thus less complete and accurate than it could be if all of the types of texts that constitute the full corpus were considered. This study seeks to further our understanding of the language and its development by considering all of the texts in the Old Occitan corpus, using the Concordance de l’Occitan Médiéval (COM). This study aims to discover whether considering the understudied prose and non-lyric poetry texts in an analysis will uncover the same variants and patterns of development as the lyric poetry, or to what extent these vary between text types.

The ongoing publication of the COM, which includes the entirety of the Old

Occitan corpus for the first time, allows the prose and non-lyric poetry texts to be 1

searched and analyzed quantitatively; without the COM, the present study would not be possible. Using this corpus and taking previous research on and descriptions of the language as a starting point (e.g. Anglade 1977, Jensen 1976), this study compares the attestations and patterns of variation in phonological and morphological features among the three major types of text in Old Occitan: lyric poetry, non-lyric poetry, and prose.

While it is the poetry which draws many people to the study of Old

Occitan, the focus on the lyric troubadour poetry in linguistic analysis is not without problems. Many of the troubadours are known for their virtuosity and playfulness, and, in addition, some poets were not native speakers of Old Occitan. Because of these factors, I propose that a systematic exploration of the language attested in the non-lyric poetry and prose texts, and the corpus as a whole, gives better insight into the development of the language and into the chronology and spread of both phonological and grammatical changes. The aim of this thesis is to provide such an investigation of the development and usage of key phonological changes and morphological features in the two largely understudied text types, non-lyric poetry and prose, as compared with their attestation in the lyric poetry. Three aspects of the Old Occitan language are considered here: the creation of comparative adjectives, the derivation of adjectives, and the use of the glide- initial diphthongs. Comparing these aspects of the Old Occitan language with respect to the type of text sheds light on the differences between the patterns in each type of text.

Thus, in this study, I seek to further understand the relationship between variation and text type, including the representation of sound change in progress. The investigation sheds light not only on our understanding of Old Occitan itself, by considering the full

2

corpus rather than a subset of it, but also on the development of Old Occitan, by considering the attestation and spread of changes in progress.

1.2 Focus of Previous Research on Old Occitan

The study of Old Occitan has primarily been based on the troubadour poetry, the well-known and widely read and studied poems of medieval . While the content of the poems is of great interest to medievalists, literary critics, musicologists, and casual readers alike, the language of the poems has been of primary interest to

Romance linguists and historical linguists for almost 200 years. In 1821, Raynourd proposed that Old Occitan was the root of the Romance , and that Italian,

Spanish, and French were all derived from Old Occitan. Although we have long known this to be false, it shows how some have given a place of centrality to Old Occitan in the understanding of the .

Even so, linguistic studies of Old Occitan have often focused on the lyric poetry, sometimes exclusively. For example, according to the introduction, Troubadour Syntax was considered, at least briefly (p. vi), as a title for Jensen’s (1986) The Syntax of

Medieval Occitan, a landmark work on the syntax of the language, which had been rather neglected until recently, and the majority of his bibliography of sources consists of editions and manuscripts of lyric poetry.

The language of the non-lyric poetry and prose texts has not been entirely neglected; many editions of such texts, such as Ingegärd Suwe’s edition of the Vida de

Sant Honorat, include detailed linguistic analyses of the language used in the text in

3

question. These analyses, however, generally consider only the text in question. They often have not been connected to our understanding of the Old Occitan language and development. Further, the majority of the non-lyric poetry and prose texts have not been published in such editions, so the language used in those texts has not, in general, been subject to analysis.

The analysis in this thesis takes a different approach; rather than focus on the prose and assume that what is found in the lyric poetry accurately describes the full corpus of Old Occitan texts, the full corpus will be considered and the patterns found in each text type will also be considered separately and compared. The study thus aims to analyze the full Old Occitan corpus, including the understudied non-lyric poetry texts and prose texts, making broader generalizations and connections rather than considering each text on a text-by-text basis.

1.3 The Old Occitan Corpus

The Old Occitan period is considered to be 1000-1500AD; although there are few texts from the earliest part of this period, the rest of the period is robustly represented by texts. The distribution of texts by date is discussed below in Section 1.7.1.

Old Occitan is best known for the troubadour poetry, a large collection of lyric poetry dealing with a wide of subjects. In addition to this, however, Old Occitan is also represented by non-lyric poetry, such as several long romans, which are narratives of biblical stories, and histories, as well as by prose texts, including charters, medical texts, and the vidas, which are short prose narratives of the of saints and poets. The

4

non-lyric poetry and prose texts are discussed below in Sections 1.4.2 and 1.4.3 respectively. In all, the Old Occitan language has come down to us in hundreds of texts containing over seven million words.

1.3.1 The Manuscript Tradition

The oldest surviving traces of Old Occitan come from around 1000 A.D. For the lyric poetry in particular, the first known troubadour was Guilhem de Peitieus (1071-

1127), but many scholars believe that his poems show a well-established tradition rather than a fledgling one, so it is likely that the tradition extends back further (Dronke 1968).

The troubadours are generally agreed to have lived and written between approximately

1100 and 1300. The earliest surviving manuscripts of their poems, however, come from about 1250. This leaves a gap of around 150 years between the earliest known poet and earliest extant , a large collection of lyric poetry and associated texts. In addition, many of the were compiled in or . It is commonly believed that during at least part of this period, the poems were orally transmitted and only written down later (Akehurst and Davis 1995, Paden 1998). The manuscript traditions and precise composition dates of the other works are less clear, except in the cases of charters, most of which are dated. Though the manuscript tradition is less clear for some of the non-lyric poetry and especially the prose texts, the gap between the composition of a text and the extant manuscript tends to be much shorter, and the likelihood of an oral tradition lower (Brunel 1953, Paden 1998). For this reason, there are considerable advantages to studying the prose texts more closely.

5

Many of the lyric poems come down to us in more than one manuscript, but few other texts do. Brunel’s 1935 Bibliographie lists 376 manuscripts in Old Occitan, most of which are from the thirteenth and fourteenth centuries, and is generally considered the authoritative set of Old Occitan works. He notes that many of them, particularly those containing prose texts, had not been published and were not readily available at the time of his publication, although some have been published since then. The lyric poetry, however, which comes primarily from the 95 chansonniers included in Brunel’s list, has been frequently edited and published, both as critical editions of the poems themselves and, in some cases, the chansonniers in their entirety.

Zufferey (1987) describes forty manuscripts of the chansonniers in some detail.

He considers only the manuscripts of lyric poetry written in , removing from consideration those written in Italy or Spain, as well as those that do not contain a majority of lyric poetry. For each manuscript, he describes thirty-two characteristics of the spelling, especially the notations of sounds known to vary greatly or to be undergoing change, such as the mid vowels, the diphthongs, the palatal sounds, and the reflexes of

Latin [k]. In addition, he notes particular or odd forms in the , , and syntax. The manuscripts of non-lyric poetry and prose texts, however, have not, in general, been described and analyzed with this level of detail.

1.3.2 Previous Editions and Availability

Until recently, many of the prose texts and some of non-lyric poetry were almost impossible to find, and even those which were published in editions were often

6

inconsistently edited. Some of the shorter prose and non-lyric poetry texts could be found only in journal articles from the late nineteenth century or compilations such as K.

Bartch’s 1856 Denkmäler der provenzalischen Litteratur. Other prose texts, such as the

Razos de Trobar, have been available in edited texts since the 1840s, but many of these editions have been problematic in various ways (see Marshall 1972). The lyric poetry, however, was frequently edited and published and is, for the most part, easy to find in any format desired. In addition, the more prominent non-lyric poetry texts such as the

Breviari d’Amor, Flamenca, and the de la Croisade Contre les Albigeois can be found readily available in critical editions.

Digital corpora of texts are important for reliable quantitative linguistic analysis, but, until very recently, few texts have been available digitally. An exception is the digital Provençal Database of the Project for American and French Research on the

Treasury of the (ARTFL). This database includes many of the lyric troubadour poems in an easily searchable format. Similarly, Rialto, the Repertorio informatizzato dell’antica letteratura trobadorica e occitana, has released digitized versions of some texts on the internet, including the “new monumental edition of the

Nouveau Testament de Lyon” by M. Roy Harris and Peter T. Ricketts (Rialto).

1.3.3 The Concordance de l’Occitan Médiéval

The Concordance de l’Occitan Médiéval (COM), created by Peter Ricketts and

Alan Reed with collaboration from many other scholars, includes the entirety of the Old

Occitan corpus in a digital format for the first time. The COM is being published in at

7

least three parts. The first, containing the lyric troubadour poetry, was published in 2001, followed by the narrative, or non-lyric, poetry in 2005. The third tranche, or part, is the prose texts, which will likely be published late in 2012. Finally, there are plans to develop a fourth tranche of the songbooks of the troubadours. This fourth tranche will include all of the versions of the troubadour poems that appear in each of the chansonniers, rather than a single critical edition of each poem. The publication of this proposed fourth tranche of the COM will help to solve the problem of multiple manuscripts discussed in

Section 4.2, but the Old Occitan corpus of texts will be essentially complete with the prose texts. The COM is, and will continue to be, an invaluable resource for scholars of

Old Occitan.

The COM includes, in most cases, a single version of each text. In some cases, particularly for texts which occur in many manuscripts, a critical edition is given, while for other texts, the edition is a diplomatic edition of the reading in one manuscript.

Ricketts gives an extensive bibliography showing where the version of each text used in the COM is found. In rare cases, such as the Regles de Trobar, a grammatical text from the end of the fourteenth century, more than one version of the text is given in the COM.

The COM aims to include all texts of Old Occitan. In fact, however, some texts have not been included. Peter Ricketts’s most recent announcement concerning the prose texts states that the administrative texts included in the prose tranche will be limited to those already encoded. In any case, the COM includes, if not all, the vast majority of the

Old Occitan corpus in a searchable database.

8

Users of the COM have access to a prepared concordance of word forms with the number of tokens of each word form in each tranche of the COM. Word forms with capital letters are at the end of the concordance, preceded by @. Once the word forms of interest are chosen, the COM gives the context of each occurrence with between one and seven lines of surrounding text and the siglum of the text in question. For any example, the entire text in which the example occurs may also be viewed. In addition to searching for whole word forms, the COM interface allows the concordance to be searched for prefixes and suffixes. The output of prefix or suffix searches is the list of words in the concordance which begin or end, respectively, with the string of characters entered.

Ricketts makes no attempt, however, to lemmatize words within the COM. The orthographic variations in the texts and the editions used in the COM database all appear separately in the concordance of word forms. To search for all examples of a particular lexeme, the user must input all of the variant orthographic forms. In addition, only one search term can be used at a time; that is, constructions or phrases cannot be searched for directly. Instead, the user must search for one part of the construction or phrase and then sift through the output for attestations of the construction or phrase in question.

1.4 Text Types and Linguistic Analysis

Considering only one text type has been shown to result in incomplete descriptions and even mistaken analyses. Hock (2000) clearly shows that the full range of syntactic variants can be found and analyzed only by considering the largest possible variety of texts. In many languages, such a variety of texts is unavailable. In Old Occitan,

9

however, a wide variety of text types and genres is available, but these resources have not been used. In addition, Hock (2000) argues that in the case of Vedic , recognizing syntactic differences between prose and poetry is necessary in order to accurately describe variation as attributable to the type of text as opposed to historical changes. One question this study seeks to answer is whether the same is true for historical .

Herring et al. (2000) argue that there is no reason why the study of sound change should be incompatible with textual parameters, although their volume does not include any discussions of this type. The present investigation will shed light on the relationship between variation of phonological patterns and text type, in addition to considering morphological patterns more in line with previous work on language and text types.

Text type differences have been claimed to have a close relationship with language change. It is frequently claimed, for instance, that verse syntax and word choice is more archaic than OE prose syntax (e.g. Hock 1985, Hutcheson 1995; cf.

Sampson 2010). It has been hypothesized that this is because verse may simply constitute a more conservative register, using certain lexical items or syntactic structures which fell out of common parlance long before the poem was written. If this is true, this study, which compares poetry and prose texts from the same time period, provides some insight into the diachrony and development of Old Occitan. It is, of course, also possible that poetry, especially the lyric poetry of the troubadours, is less conservative than the prose from the same time period.

10

1.5 Text Types in Old Occitan

The lyric poetry of the troubadours has been the focus of modern interest in the language and culture of this area during the Old Occitan period. Though the non-lyric texts, both poetry and prose, are much less well known than the lyric poetry, and less well studied and described, there are many such extant texts. In this thesis, I consider not only the lyric texts but other types of texts as well.

Although most recent work on the relationship between different kinds of texts and language use has focused on narrow genres such as scientific articles (e.g. Swales

1990) or sections of newspapers (e.g. Jucker 1992), this work has been done on English and other languages in which the broader categories of texts have already been studied. In

Old Occitan, however, work on the larger categories has not yet been done and the relationship between those broad categories and linguistic patterns is my focus here.

Given the strong connection of the lyric subject and poetic types with the term “genre”, as discussed in Section 1.5.1.1 below, I refer to the broad distinctions between lyric poetry, non-lyric poetry, and prose as “text types” rather than “genres”, though “genre” is the more common term in studies of this type.1

In this thesis, I consider three broad text types: lyric poetry texts, non-lyric poetry texts, and prose texts. These text types are introduced in Sections 1.5.1, 1.5.2, and 1.5.3 respectively. Though the lyric poetry texts and non-lyric poetry texts are both poetry, I consider them separately in this thesis primarily because of the emphasis placed on the

1 I am aware that “text type”, like “genre”, is often used to refer to more narrowly defined groups of texts in other languages, such as newspapers, , etc., and therefore may have the same connotations and baggage to some readers that “genre” does, but for the purposes of this thesis, I consistently use “text type” to refer to the distinctions between lyric poetry, non-lyric poetry, and prose. 11

lyric poetry in the literature on Old Occitan. I am interested in whether descriptions of the language use based on the lyric poetry accurately describe what is found in the non-lyric poetry. That is, I am interested in whether the lyric and non-lyric poetry texts show the same patterns of variation and language use, or whether there are differences between them.

1.5.1 Lyric Poetry

Lyric poetry, according to the Princeton Encyclopedia of Poetry and Poetics, is the branch of poetry in which poetry retains most prominently the elements which show its origin in musical expression (p. 713). While music and musical accompaniment may exist in other branches of poetry, which I have here collectively called non-lyric poetry, it is only secondary for these types of poetry. In lyric poetry, on the other hand, the music is intrinsic to the poem, and the elements which lyric poetry shares with the musical forms that produced it is the irreducible denominator of lyric poetry (Preminger and Brogan

1993). The lyric poetry of the troubadours is lyric not only in the broad modern sense of a type of poetry which is “mechanically representational of a musical architecture and which is thematically representational of the poet’s sensibility as evidenced in a fusion of conception and image” (Preminger and Brogan 1993, p. 715), but also in the older and more restricted sense of a poem written to be sung. Although we have the melodies for only about 10% of the extant troubadour lyric poems, it is assumed that virtually all of them were originally set to music (Zumthor 1995). In fact, while many scholars in various fields turn to the troubadours for their content and poetry, the real legacy of the

12

troubadour poets, according to Zumthor, is in the musicality of their poetry rather than its themes.

Critical attempts to define lyric poetry with more or less success have included elements such as brevity, the spontaneous overflow of powerful feelings, and intensely subjective and personal expression (Preminger and Brogan 1993). All of these elements are very clear in the lyric poetry of the troubadours. Many of the poems concern either praise or condemnation of people, actions, trends, or aspects of society in ways that can only be described as subjective, while others are focused on emotions or reactions. The lyric poems of the troubadours are also relatively short. The longest by far is the 329-line

Jesus Cristz, nostre salvaire of Peire Cardenal, and, while there are a few other lyric poems of more than 150 lines, the vast majority of the lyric poems contain between twenty and forty lines of text. Some of the non-lyric poems, on the other hand, contain tens or hundreds of thousands of lines.

The lyric poetry of Old Occitan is what draws many scholars to the language.

Though the period of the troubadour poetry began and ended in less than 200 years, the lyric poetry of the troubadours was a great influence on the literature and music of

Europe (Zumthor 1995). Paden (1993) has referred to Old Occitan as a “lyric language”, particularly with reference to the perception by both native and non-native speakers of

Old Occitan as the language in which lyric poetry is most appropriate, as indicated by the bits of lyric verse in Old Occitan which were sometimes inserted into texts, adapted in what “comes perilously close to ” (Paden 1994, p. 37).

13

The COM includes all of the known lyric poems of the troubadours, including all genres of lyric poetry (see Section 1.5.1.1 below). This yields a lyric poetry corpus of

4552 poems comprising 726,426 words and 39,682 word forms. References to specific lyric texts or poets use the reference system from Ricketts’s Concordance de l’Occitan

Médiéval:

The reference to the poet and poem follows the bibliographical index established by István Frank in his Répertoire métrique de la poésie des troubadours (2 vol., : Champion, 1966) following the Bibliographie der Troubadours of A. Pillet and . Carstens (Halle, 1933). In the few cases where Frank did not include a particular poem, the reference is to Pillet-Carstens, and, in the rare cases where the poem is in neither, a new reference is used following the system of identification (e.g. PC 440 1). (Ricketts 2005)

Where a specific example is used in this thesis, the reference is given alongside the text in the same format. Complete bibliographic information of the text, poet, and source of the edited text used in the Concordance de l’Occitan Médiéval is given at the end of the thesis in Appendix A.

1.5.1.1 Genre in Old Occitan

With the attention given to the troubadour poetry, anyone familiar with Old

Occitan will consider a “genre” distinction to refer not to the distinction between prose and poetry, but instead to the distinctions between varieties of lyric poetry: the cansos

(love songs), sirvantes (satire, usually political), tensos (debate poems), and others

(Paden 2000). Hill and Bergin (1941) and Frank (1957) list and describe twenty-five genres of troubadour poetry and clearly label each known poem as belonging to one of 14

these genres. These genres are differentiated primarily by subject matter, but also by broad correlations of the genre with line , counts, perspective of author, stated audience, and other formal features. Paden (2000) describes the development and history of, and relationship between, these genres, and argues that there are many poems, especially early ones, that do not fit neatly into one of these categories. In addition, he identifies several “marginal genres” populated by only a few poems. For the purposes of this thesis, however, I consider the troubadour corpus, with all of its included genres, as one category, which I refer to as lyric poetry. Here, I am not looking for linguistic distinctions and correlations among the genres within the lyric poetry. It is quite possible, even likely, that there are linguistic differences among these lyric genres, especially in the syntax, but this is not the focus of this thesis.

1.5.2 Non-lyric Poetry

In addition to the lyric poetry, there is a large corpus of non-lyric poetry. Much of this poetry is narrative, though Limentani (1977) characterizes narrative work as

“exceptional” within the literary texts of the . The romance Flamenca, for example, is a narrative poem that tests the viability of the ideals of the lyric troubadour poets. The corpus of non-lyric Old Occitan poetry includes both very long texts, such as the epic poem Chanson de la Croisade contre les albigeois and the immense Breviari d’Amor of

Matfré Ermengaud, short dialogues, short narratives of the lives of saints and poets, religious works, and dramas. In addition, there are a few letters written in verse and a collection of nine ensenhamens, which are instructional poems that may be the earliest

15

didactic poetry in with non-religious inspiration (Fleischman 1995). Fleischman

(1995) gives a description of the kinds of texts found in Old Occitan non-lyric poetry and examples of texts from each. This corpus of poetry, though smaller than the narrative poetry corpus which survives in northern France from the same time period, is a vast and widely varied base to use for linguistic research.

As described in Section 1.2, many of the non-lyric poems have been carefully edited and described, particularly the long poems such as the Breviari d’Amor, the Vida de Sant Honorat, the Leys d’Amors, and the epic poems. Published editions of these texts often include lengthy discussions of the linguistic features found within the text. Many other poems, however, particularly the shorter works, have not had such linguistic studies published, and the non-lyric poetry text type, unlike the lyric poetry, has not been studied as a whole.

The COM includes 351 non-lyric poems, comprising 1,631,012 words and 95,159 word forms; this is more than twice the size of the corpus of lyric poetry. In the COM,

Ricketts gives each text a three- siglum; the Chanson de la Croisade contre les albigeois, for example, is referred to as CCA. These three-letter sigla are used in this thesis to refer to examples from specific texts; complete bibliographic information of the text and the source of the edited text used in the COM is given at the end of the thesis in

Appendix A.

16

1.5.3 Prose

The prose texts of Old Occitan are at least as varied as the non-lyric poetry texts.

The majority of the prose texts are administrative texts: charters, chronicles, law codes, and lists of donations and events at various monasteries and . There are also several medical texts, such as La Chirurgie d’Albucasis and a Fragment of Recettes

Médicales en Langue d’Oc, and a small handful of technical texts, such as an Herbier and

La siensa d’atermenar de Bertran Boysset. Other types of texts found in the prose include biblical , most frequently parts of the New Testament, glossaries, and a calendar with a list of saints’ days. While these topics are found only in the prose, there are also many prose texts with the same subject matter and purpose as some of the non- lyric poetry texts. For example, there are didactic texts, letters, and religious texts in prose as well as poetry. There are also historical and narrative prose texts in addition to those which occur in poetry form. Finally, there are grammatical texts among both the prose texts and the non-lyric poetry texts, though the prose grammatical texts are much shorter and more concise than the poetic ones.

As of May 2011, there were 285 prose texts in the as-yet-unpublished third tranche of the COM. A number of further texts that will be included in the published

COM had not been checked and finalized at that time and are not included in the analysis reported in this thesis. The 285 texts included in this discussion contain 5,033,319 words and 209,520 word forms. The prose texts thus account for over twice as much material as the lyric and non-lyric poetry combined. Quantitatively, the prose corpus is thus more than sufficient in size to allow for meaningful analysis and comparisons.

17

Ricketts assigns a four-letter siglum to each prose text similar to the three-letter sigla used to identify the non-lyric poems. For example, La Chirurgie d’Albucasis, a fourteenth-century medical text, is identified by the siglum ALBU. These four-letter sigla are used in this thesis to refer to examples from specific texts; complete bibliographic information of the text and the source of the edited text used in the COM is given at the end of the thesis in Appendix A.

1.6 Why Study the Prose Texts?

One reason to study the prose texts of Old Occitan is simply numerical. The prose texts comprise over five million words, accounting for over two thirds of the total extant

Old Occitan corpus. To include these texts in our study significantly widens the basis of material for study. The larger number of texts and words we have with the inclusion of the prose texts makes computational techniques and statistical studies of Old Occitan more reliable. It also makes it less likely that our understanding of Old Occitan will be skewed by individual authors or texts, as it allows for the identification of outliers or unexpected occurrences and patterns within individual texts.

In the same way, the larger number of texts and words makes it less likely that our understanding of Old Occitan will be skewed by the practices of one individual type of text. The inclusion of the prose texts, with their widely varied topics, purposes, and formats, provides crucial evidence for considering which aspects of the language used in these texts can be attributed to the type of the texts and which aspects of the language used appear in only one type of text.

18

Further, at least some of the prose and non-lyric texts, as noted in Section 1.3.1 above, have a shorter gap between the composition and the manuscripts, giving us linguistic material closer to the source. The chance of external influence, especially from a non-native compiler, as in the case of some of the chansonniers, is therefore smaller for the prose texts.

The prose texts, by definition, do not have a meter or scheme. While it is not clear that the decisions of word choice and phrasing in the lyric poetry are necessarily driven by considerations of meter and rhyme, the prose texts give us a large corpus of texts in which there is no consideration of rhyme or meter. Sampson (2010) argues that the constructions found in the noun phrases of Old English poetry are also found in the prose, but that the frequency of usage of some of these structures is different in the poetry because of the metrical and stylistic constraints of the poetry. In the same way, the patterns of usage of some words or phrases in the Old Occitan prose texts may be different from those in the poetry because of the lack of meter and rhyme. A close analysis of these differences could provide insight into which patterns are due to considerations of meter or rhyme and which patterns are independent of such considerations.

In addition, the inclusion of the prose texts gives us material that may be closer to the everyday language of speakers. The lyric poetry of the troubadours has been described by some as “playful” and “virtuosic” (Akehurst and Davis 1995). The poets showed a delight in using near synonyms, rare words, the same word with different meanings, different derivations of the same root, puns, and other word play. The rhyme

19

schemes and line formations are very strict and sometimes tangled and hard to read; in extreme cases, such as the trobar clus, the form of poetry is deliberately used to obscure the meaning of the poem (Akehurst and Davis 1995). The poems were performed in the courts as well as written and circulated, and were expressly meant for the entertainment of the educated elite. While this may not necessarily bar the lyric poetry from being an accurate reflection of the language, it does at least raise serious questions about its authenticity (see Section 1.8.2 below). That is, it raises questions about whether the forms attested in these poems represent the true range of variation in everyday speech, and whether, given the troubadours’ reputation for word play and innovation, some of the attested forms reflect actual usage at all. Of course, the prose and non-lyric poetry texts may not reflect actual language use either, due to the textual transmission and the fact that they are undeniably written texts, in addition to context-specific conventions for texts such as the charters. In spite of this, however, the prose texts may offer a better, or at least an alternative, reflection of the language actually used and passed down than the ostentatious performative lyric poetry.

Finally, the language of the troubadours, insofar as it can be described as a single entity, has been analyzed as a koine by some, because of its lack of a geographic root, the presence of forms from many , and its use as a regional lingua franca (Cantalusa

1990, Paden 1998). Based on Siegel’s (1985) definition of koine, however, this use of the term seems to be inaccurate in this case because there is no evidence that the Old Occitan of the lyric poems was based on one dialect with only a few features from other dialects combined into a new dialect. Nor is the troubadour’s language consistent in terms of

20

which forms are derived from particular dialects. Instead, forms of the same word from various dialects are used side by side in Old Occitan lyric texts. The non-lyric poetry and prose texts, although much less well-studied, have the potential to show less dialect mixing and to be closer to the language of actual individual speakers than to the lingua franca or court koine of the lyric poetry.

The prose texts, then, may have considerable value to linguists as possibly more accurate but drastically understudied glimpses of the language outside the flashy courtly poems of the troubadours, but they are not without their own problems. Aside from difficulties of access, which have been solved by the Concordance de l’Occitan

Médiéval, many of the charters show pervasive code-switching between Medieval and Old Occitan (Brunel 1926). In addition, some of the administrative or notarial texts are very formulaic. It has also been suggested that some of the medical texts are translations from Latin (Harris and Vincent 1988), and all of the biblical texts are translated, either from the original Hebrew and Greek or from some intermediate language such as Latin. Further, especially after the Albigensian crusade, there was constant influence from the French dialects to the north in administrative documents

(Paden 1998). While these are important issues to keep in mind, they do not outweigh the possible benefit of considering these sources for evidence of changes in progress.

1.7 Other Methods of Classifying Old Occitan Texts

In addition to the division into three broad types of text discussed in the previous section, there are other ways of considering and classifying the texts to attempt to account

21

for linguistic variation. I consider two of these ways here: the date of the text and the dialect of the text.

1.7.1 Date

In order to consider how the date of the text may be relevant to the grammatical variants in question, I assign texts to periods of fifty years wherever possible. The

Concordance de l’Occitan Medieval (COM) itself does not give any information about the texts, but it does include a bibliography of Ricketts’s sources of texts. The date, approximate date, or date range given by the editors of Ricketts’s sources is used to categorize the texts. Although some texts, particularly prose texts, have a date or range of dates as part of the text itself, for most texts the editors’ decisions concerning dates were based on handwriting, watermarks, or colophons.

For all texts where there is a choice between the date of composition and the date of the manuscript, and the dates are within 100 years, the date of the manuscript is chosen. Milroy (2000) argues that some features of texts are the result of their textual history, often due to the texts being copied by scribes from different dialect areas, and the same is true of Old Occitan texts. This opens the question of whether the data represented in a text are the product of the author or of the later writer(s) or copyist(s), who may have elaborated or changed the text in various ways or simply copied it with more or less accuracy. This is a larger concern when there is a large gap between the date of composition and the date of the manuscript or when it is copied in a different area. If the gap between the date of composition and the date of the manuscript

22

was more than 100 years, the text is not included in any time period. This decision was made because it is possible that such a text would include features of both time periods and be too mixed to be a clear example of either for the purposes of a quantitative analysis. Though close qualitative analysis of these texts would no doubt be insightful, the practices of copying and addition that make these texts interesting also have the potential to cause problems for quantitative analyses.

In addition, if the date of a text was between two categories, it is assigned to the later category of the two. For example, a text whose date was described as “turn of the fourteenth century” is assigned to the 1300-1350 date range, and a “mid-fourteenth century” text was assigned to the 1350-1400 date range.

The fifty-year division is a largely arbitrary time frame. The choice of this division is driven entirely by the material available to me as I analyzed the texts. While the dates of some manuscripts are very specific, such as the manuscript of Vie de Sainte

Marie-Madeleine, copied on August 4, 1375, that is not the case for the vast majority of manuscripts. For example, the manuscript containing Court Poème Suivant le Dialogue entre Jésus et l’arpenteur is dated to « la première moitié du XVe siècle »

(Thiolier-Méjean 1997, p. 191) and the Etablissement du marché à Montagnac is from

« la fin du XIIIe siècle » (Vidal 1901, p. 70). Practically speaking, the dates in the descriptions and editions used by Ricketts require a half-century division. If I had tried to divide the texts into smaller periods of time, many more texts would have had to be excluded from a consideration of date because it would not be clear which decade or period of twenty or twenty-five years would be most appropriate for a “first half of the

23

fourteenth century” or “end of the thirteenth century” text. Using periods of fifty years allows me to easily include these texts.

In many cases, information about the date of a text is unavailable or contradictory.

For some texts, there simply is no information about the date of the manuscript given in the published edition(s) of the text. Other texts, such as many of the chronicles, include material from a very wide range of dates. The most extreme example of this is the

Cartulaire de la ville de Lodève, which has administrative entries from 1246 to 1513.

Texts of both of these types are not put into any of date-related categories and are excluded from the analysis of texts by date, though they are included in the analyses by text type.

Another, larger, group of texts that cannot be classified by date are those that occur in multiple manuscripts. In these cases, each of the manuscripts usually gives a slightly different reading. Unfortunately, the COM project does not include the different manuscript versions in most cases,2 though the planned fourth tranche of the COM will do so for the lyric poetry. Instead, a critical edition is usually used, which is not the full text from any one manuscript and therefore cannot be classified based on date. While some prose and non-lyric poetry texts were excluded from the analyses by date for this reason, the problem is far more widespread in the lyric poetry. Few of the lyric poems occur only once, and even in those cases, the date of the single occurrence is rarely clear.

Because of this, I do not attempt to classify the lyric poetry based on date.

2 There are a few exceptions, texts such as the Regles de Trobar, which occurs in two manuscripts and both are included as separate texts in the COM (JDFH and JDFR). In this case, both texts in the COM were included in the analyses in this thesis. 24

In the prose and non-lyric poetry text types, there are 363 texts, totaling almost four and a half million words, which can be classified by date with some degree of certainty. The breakdown of texts, words, and word forms is given in Table 1.1.

Table 1.1 Number of Texts and Words by Date

Texts Words Word forms 1050-1100 4 5,942 2,324 1100-1150 6 39,476 5,188 1150-1200 10 32,658 7,785 1200-1250 25 282,686 38,076 1250-1300 48 918,805 61,943 1300-1350 96 1,229,554 71,850 1350-1400 87 789,038 60,832 1400-1450 42 637,083 47,092 1450-1500 45 671,994 51,941 Total 363 4,607,236 213,674

Perhaps unsurprisingly, the earliest time periods are represented by fewer texts, while later time periods, particularly those of the fourteenth century, are represented by a much larger number of texts and words. The time periods after 1200 are of comparable size and provide a good basis for quantitative analysis. The earlier time periods, however, are represented by too few texts and too few tokens to act as the basis for meaningful or conclusive quantitative analysis; quantitative data from these groups of texts will be given, but the data found in these early groups is generally analyzed qualitatively.

25

In some cases, it is useful to consider dates within the text types. The breakdown of non-lyric poetry texts, words, and word forms by date is given in Table 1.2, and that of the prose texts is given in Table 1.3.

Table 1.2 Number of Non-Lyric Poetry Texts and Words by Date

Texts Words Word forms 1050-1100 4 5,942 2,324 1100-1150 3 1,259 623 1150-1200 5 19,425 5,455 1200-1250 14 211,500 26,384 1250-1300 22 228,278 23,883 1300-1350 45 444,483 33,652 1350-1400 37 223,303 25,906 1400-1450 11 31,851 5,702 1450-1500 26 257,053 27,028 Total 167 1,423,094 95,034

26

Table 1.3 Number of Prose Texts and Words by Date

Texts Words Word forms 1050-1100 0 0 0 1100-1150 3 38,217 4,743 1150-1200 5 13,233 2,956 1200-1250 11 71,186 14,992 1250-1300 26 690,527 46,787 1300-1350 51 785,071 49,210 1350-1400 50 565,735 44,678 1400-1450 31 605,232 44,620 1450-1500 19 414,941 31,381 Total 196 3,184,142 155,414

It is interesting to note that, while the number of datable non-lyric poetry texts that have survived is similar to the number of prose texts, the number of words in the prose texts is much larger than in the non-lyric poetry texts. In addition, a comparison of Tables 1.2 and 1.3 shows that the proportion of prose and non-lyric poetry is not consistent across each time period. For example, the prose texts account for the overwhelming majority of words in the 1400-1450 time period, with the non-lyric poetry representing only 31,851, or 5%, of the 637,083 words in this time period. On the other hand, the lyric poetry represents 444,483, or 36.1%, of the 1,229,554 words in the 1300-1350 time period.

Ideally, the distribution of lyric and non-lyric texts across the time periods would be consistent; because this is not the case, linguistic trends found to correlate with the date

27

of the text must be carefully considered in order to avoid possible confusion with text type differences and vice versa.

1.7.2 Dialect

In addition to the date of a text and whether it is lyric poetry, non-lyric poetry, or prose, the texts in the Old Occitan corpus can be analyzed in terms of the dialect. Old

Occitan is often considered to be a collection of dialects rather than a single language.

The differences between the dialects of Old Occitan in this period are, in some cases, as strong as those between Old Occitan and Old French. While each city in the region can be said to have its own dialect, many of them cluster together to form dialect groups.

In order to classify texts by dialect, it is necessary first to consider the dialects of

Old Occitan and what their boundaries are. For the purposes of this analysis, I consider seven dialects, the six major dialect areas of during the identified by Bec (1986) and others – Alpine Provençal, Auvernhat, Gascon, Provençal,

Languedoc, , and Provençal ‒ as well as the Waldensian dialect from the of . The particular dialects do not play an ideological role in this thesis and I do not intend to defend the seven dialects chosen for analysis. For the purposes of this thesis, a way of organizing the texts geographically is crucial in order to compare analyses by dialect with those by text type and date. The dialect demarcations, while less arbitrary than the fifty-year time periods discussed in the previous section, are nonetheless somewhat arbitrary. Because others have recognized the presence and importance of the geographical dialects of Old Occitan, I have chosen to follow their lead

28

in terms of which dialects to consider. In the end, however, the choice of which dialects to consider in this analysis is a functional one, used to organize the data in order to consider whether analyzing the texts in terms of geography and dialect accounts for variation found within and between texts.

The dialect divisions in Old Occitan are nonetheless difficult. Because the geographic spread of diagnostic variants during the Old Occitan period is often unclear, most discussions of Old Occitan dialects use the dialect divisions and boundaries of the

Modern Occitan dialects and project backwards with some alteration. The six dialect areas in southern France that are used in this thesis are shown in the map in Figure 1.1 and are introduced alphabetically in the following paragraphs.

Figure 1.1 Map of Approximate (Old) Occitan Dialect Areas

Alpine Provençal, spoken in and near the slopes of the to the north of the

Provençal dialect areas, is characterized by the loss of intervocalic [t], and the change of

29

[r] < [l] before labials. This dialect has many features in common with Franco-Provençal, such as the maintenance of Latin atonic [o], particularly in verb forms (Bec 1986).

Auvernhat, spoken in the center of the in the northern part of the

Old Occitan region between the area and the Alpine Provençal dialect area, is characterized by a series of palatalizations affecting most consonants and a reduction of many of the Old Occitan diphthongs to simple vowels (Bec 1986).

The Gascon dialect area lies roughly between the Atlantic, the , and the

Garonne . The modern Gascon dialect is quite distinct from the other Occitan dialects. In this dialect, Latin [f] becomes [h], intervocalic [n] is lost, and [mb] and [nd] clusters are simplified to [m] and [n] respectively (Bec 1986). The Gascon dialect is also notable in its morphology, where it has a collection of verb endings not found elsewhere in Old Occitan. The Gascon dialect, as used here, includes what are often called the

Bernais, Gascon, and Pyrenean dialect areas in one large dialect area. This decision is necessary because the editors of several texts, including L’abbaye de Lucq en Béarn au quatorzième siècle, the Boucheries d’, and Le Censier gothique de , identify the dialect as both Bernais and Gascon, or as Bernais and Pyrenean.

Limousin, one of the names by which the Old Occitan language as a whole is referred to during the Middle Ages, more properly refers to a smaller dialect area north of the Auvernhat dialects. The Limousin dialect includes the cities of and .

Limousin shared many features with Auvernhat; for example, intervocalic [l] becomes [r] or [w], and the first person singular verb ending is -e (Bec 1986).

30

The dialect was spoken in the center of the Old Occitan region from

Garonne to Ariege and includes many of the important cities of the time period such as

Toulouse. Languedoc is considered the most conservative dialect, avoiding the palatalization of Latin [kt] and the vocalization of [l] (Bec 1986). Languedoc is divided into eastern, western, northern, and southern varieties. The eastern varieties are very similar to the Provençal dialects, forming a .

The term Provençal, like Limousin, has at times been used to refer to the language spoken in the entire region during this time period and is at times used as a synonym for

Old Occitan. More specifically, however, the Provençal dialect is the southeastern Old

Occitan dialect area, including the southern part of the Old Occitan-speaking area from

Digne and in the west to the Alps in the east. The Provençal dialect, unlike

Languedoc, does vocalize [l] and does not have an -s ending for plurals.

The Waldensian dialect, which is also sometimes called Vaudois, is the variety of

Old Occitan used in the writings of the , an early evangelical Christian sect.

This dialect was spoken outside modern-day France, in the Cottian Alps in an area of what is today northern Italy (Harris 1984). Almost all of the texts in the Waldensian dialect are religious texts, many of them biblical translations. Though the Waldensian texts have been claimed to be a kind of artificial school language, intended to produce forms partway between the more central Old Occitan dialects and the actual spoken language of the audience, Borghi Cedrini (1980) has argued that the Waldensian movement, which aimed to reach speakers in their own language, would not have been likely to be aimed at anything except the spoken forms of the audience, and argues that

31

the striking similarities between the Waldensian texts are instead a result of the texts having been written within a very narrow time span. The Waldensian dialect, unlike the other dialects, is not only a geographic dialect but also represents a distinct social and chronological development. This may account for the important differences found in this thesis between the Waldensian dialect and the other dialects.

Bec (1986) considers the first six of these dialects in three groups: Northern

Occitan, including Limousin, Auvernhat, and Alpine Provençal; Meridional Occitan, including Provençal and Languedoc; and Gascon. Bec (1986) also considers the linguistic features of Catalan in comparison with the dialects of Old Occitan. After noting the same dialect grouping as Bec, Smith (1995) also looks at a different set of linguistic features and pairs Gascon and Languedoc together, grouping Provençal with the northern dialects.

For the purposes of this analysis, I focus on the dialect areas from the mid-level dialect groupings identified by Bec. While there are differences between, for example, eastern and western Languedoc, for the purposes of this analysis, I simply classify both as

Languedoc. This allows the use of texts designated by editors as Languedoc as well as texts designated by more specific dialects or locations. On the other hand, I have chosen not to use the higher-level dialect groupings of Northern, Meridional, and Gascon for two reasons. First, the Provençal and Languedoc dialects are also two of the largest in terms of the number of attested texts and words; to collapse them into a single dialect area for consideration would not be a very insightful comparison as there would not be a comparable number of texts or words in the remaining dialects to allow for any meaningful quantitative analysis. Second, the fact that the Languedoc and Provençal

32

dialects are consistently referred to as separate dialects throughout the literature makes me hesitant to consider only the three highest-level dialect groupings.

For some texts, the editors give a clear indication of the dialect of the text. For others, particularly the prose texts, a location is given, rather than specifying a dialect. In these cases, I make the assumption that a text in a specified location was written in the dialect characteristic of that location. For example, the text of Tristura was, according to

Bertoni (1906), written in . Because Montpellier is located within the

Languedoc region, the text is classified as Languedoc. Although many of the texts in the

COM can be securely located in one of these dialect areas, the number is much smaller than the number of texts that can be dated. Table 1.4 shows the number of texts, words, and word forms in each dialect.

Table 1.4 Number of Texts and Words by Dialect

Texts Words Word forms Alpine Provençal 5 58,159 8,708 Auvernhat 8 25,368 5,001 Gascon 45 770,032 50,243 Languedoc 59 753,710 58,249 Limousin 4 8,409 6,804 Provençal 30 344,136 30,504 Waldensian 12 368,243 19,129 Total 163 2,339,978 119,192

33

The Alpine Provençal, Auvernhat, and Limousin dialects are represented by many fewer words than most other dialects. These small groups of texts and small numbers of words make it difficult to make any claims about these dialects. Because of this, it is more likely in these groups of texts than in the much larger groups such as the Languedoc dialect that differing patterns may be simply a sampling error of sorts or be due to the individual style preference of the author or scribe of a text.

It is sometimes useful to consider dialect within the text types. The breakdown of non-lyric texts, words, and word forms by dialect is given in Table 1.5, and that of the prose texts is given in Table 1.6.

Table 1.5 Number of Non-Lyric Poetry Texts and Words by Dialect

Texts Words Word forms Alpine Provençal 1 43,109 6,784 Auvernhat 1 2,184 1,021 Gascon 6 1,143 613 Languedoc 9 29,815 6,890 Limousin 2 1,107 535 Provençal 8 20,510 4,136 Waldensian 1 13,615 2,732 Total 30 111,483 15,309

34

Table 1.6 Number of Prose Texts and Words by Dialect

Texts Words Word forms Alpine Provençal 4 15,050 2,337 Auvernhat 6 23,184 4,291 Gascon 39 768,889 50,066 Languedoc 50 723,895 55,376 Limousin 2 7,302 6,368 Provençal 20 323,626 28,628 Waldensian 12 353,937 18,498 Total 133 2,215,883 112,349

The prose texts make up the vast majority of the texts which can be securely located by dialect, though some dialects are represented by far more texts and words than others.

The number of non-lyric poetry texts, words, and word forms, however, is considerably smaller than that of the prose. Further, three of the non-lyric poetry dialect groups, Alpine

Provençal, Auvernhat, and Waldensian, are represented by a single text, while the

Limousin dialect is represented by only two texts. Because the numbers of texts in these groups are so low, it is impossible to tell to what extent patterns found in these texts are a reflection of the dialect of the text rather than the individual style or choices of the author. Combining the non-lyric texts in each dialect with the prose texts in each dialect makes the number of texts larger and mitigates this problem, but does not erase it entirely.

In addition, as is the case in the groups of texts in each date category discussed in the previous section, the proportion of non-lyric poetry and prose varies widely across the 35

dialects. The non-lyric poetry accounts for 43,109, or 74.1%, of the 58,159 words in the

Alpine Provençal dialect, because the single Alpine Provençal non-lyric poetry text,

Mystère de Saint Pons, is a very long text, far longer than the four prose texts in Alpine

Provençal combined. On the other hand, the non-lyric poetry accounts for only 1,143, or

0.1%, of the 770,032 words in the Gascon dialect.

In all, the non-lyric poetry texts account for only 4.7% of the texts with secure locations or dialects. Although these non-lyric poetry texts are included in the analyses of dialect throughout this thesis, the small number of these texts means that the dialect analyses in the chapters that follow may be better conceptualized as an analysis within the prose texts, rather than one which cuts across the Old Occitan corpus in a different way from the text type analysis and chronological analysis.

In addition to the geographic dialects, another “dialect” of Old Occitan that should be mentioned is the supraregional dialect, often referred to as a koine, of the lyric poetry. The lyric poetry of the troubadours includes features of all of the dialects of Old

Occitan side by side. It has been argued (e.g. Jensen 1995, Smith 1995) that the most likely location for the source of the lyric poetry is the Languedoc dialect area in the center of the Old Occitan-speaking region, with features from other dialects added.

Jensen (1995) argues that “the refinement process that a spoken language or dialect must undergo in order to attain the level of a medium capable of literary expression is in no small measure responsible for having obscured the geographic provenance of the troubadour koine” (p. 350). While the concept of “refinement” of a language may not be helpful, it seems quite clear that the language of the lyric troubadour poetry does not

36

belong to any of the geographical dialects of Old Occitan, and thus lyric poetry is not included in the analyses of the Old Occitan texts by dialect.

1.8 Challenges in Studying a Historically Attested Language

Historical linguistics is often said to be the art of making the best use of bad data

(e.g. Labov 1994). Because of this reliance on “bad data”, which is better called

“incomplete data” (Joseph and Janda 2003) and may simply be incomplete or may be corrupted or lack authenticity in some way, historical linguists can rarely, if ever, make statements with certainty. When working with historical data, linguistics can be truly confident of little. Some things may be quite confidently assumed, but there is always the possibility that the data are misleading because of variation or textual practices we do not fully understand, or because of editing or copying issues we are unaware of. Because our data are always , and stand a real possibility of being ‘bad,’ the theories, reconstruction, and descriptions based on them will never reach absolute certainty.

Unlike those made by linguists working with modern languages and native speakers,3 the generalizations and descriptions made by a historical linguist are, or should be, always made with caution. Generalizations and descriptions are only as solid and as certain as the data on which they are based, and the data are invariably incomplete.

The data are undeniably incomplete in various ways. On the one hand, there tend to be gaps in time. Even in periods where there is textual evidence, Herring et al. (2000)

3 While linguists working with modern languages and native speakers may not ask the right questions to elicit complete data, the key distinction is they have the opportunity to ask such questions of the speakers in their studies. Linguists working with historical data do not have such opportunities. 37

point out that the gaps in what kinds of data and texts are available can be dramatic.

Gothic, for example, is attested almost entirely as translated text and is represented, essentially, only by a single author, while Old Tamil, Homeric Greek, and several other ancient languages are attested only in poetic verse. Fleischman (2000) also points out that even for the languages heavily studied by historical linguists, the available textual data are limited, and balanced sampling of text types is difficult or simply impossible. We are fortunate, then, that Old Occitan has such a wide variety of texts. The sampling is not balanced, as the prose texts, particularly the administrative prose texts, are quite numerous, but many types of texts are attested over a wide period of time. The earliest texts are poetry and charters, while religious texts are much better represented in the second half of the Old Occitan period. Even though the texts are not exactly balanced, the fact that we have such a variety of texts makes Old Occitan, in some ways, a more reliable candidate for study than languages which are represented by only one author or text type.

Some have argued that there can be too much data, and that such a situation can get in the way of a clear understanding of what is going on. Lass (1997) points out that too much data can sometimes be a hindrance in that it may muddle the picture by making it harder to know what to take as input for the ‒ or indeed what to consider when describing a language change. For example, in the study of the glide- initial diphthongs of Old Occitan (see Chapter Four), I take all of the data from the COM seriously and consider all cases in which this diphthongization occurred in order to accurately account for it. The results of this data-mining and comparison are both

38

overwhelming and discouraging; the conditioning environments that had been suggested all failed to account for the attested instances of diphthongs. The diphthongs developed from both low mid and high mid vowels, in both content words and function words, in both open and closed syllables, and before and after every single sound in Old

Occitan. Working through and considering all of this data, however, brought a much clearer understanding of the development and spread of these diphthongs. In cases like these where there is enough data to be rather confusing, careful sifting of the data is crucial, and ignoring existing data, as the focus on lyric poetry has often done, seems negligent. We cannot have too much data; the more data we have, and use, the more complete and accurate the descriptions of historically attested languages and language change will be.

In the case of Old Occitan, while there is a large corpus of texts, we know that many texts have been lost. There are references to poets from whom we have no poems at all and references to texts of which we have never found manuscripts, and many scholars theorize that the narrative tradition in Old Occitan was much more prolific than the surviving texts and that a “once sizable corpus suffered major textual losses” (Fleischman

1995, p. 174). Researchers interested in the language of Old Occitan must work within this finite corpus and, perhaps more importantly, with only written texts.

1.8.1 Concerns when Studying Written Texts

Because written texts are the only data we have for Old Occitan, it is important to consider the validity of the use of such written texts as linguistic data in the absence of

39

native speakers and spoken data. Most linguists consider writing as secondary to language, as it is further removed from the source and loses some of the information that is available when speakers interact face to face. In addition, there may be some constructions that occur rarely in speech and are essentially limited to written texts

(Joseph 2000).

Some research has been done on the relationship between different registers and speech (e.g. Biber et al. 1998), but the correlations made in these studies tend to be between spoken Modern English and the various genres and registers of historical texts.

Because we do not have any spoken language recordings or native speakers from the Old

Occitan period to compare to the extant written texts, it is impossible to determine the relationship between spoken and written language for Old Occitan or for any other written language. Though some texts include transcripts of speech, these transcripts rarely, if ever, actually record the speech verbatim but instead give the basic meaning of the speeches. Similarly, when considering Early Modern English, some scholars believe that Shakespeare wrote in conversational language that more or less accurately represents the speech of his time, and that the dramatic language of Shakespeare can be used to reconstruct the speech of his time period (Rissanen 1986), but this is only an assumption and even texts such as Shakespeare’s plays are still attestations of language in written form rather than direct evidence of speech.

Old Occitan spelling is conservative. Though the texts considered here include the earliest attested in the language, insofar as Old Occitan represents a “different language” from Latin, it is reasonable to assume that writers may have made some reference to the

40

spellings of Latin words. Thus they may be affected by Latin spellings (based on Latin pronunciations) that do not represent current pronunciation.

Since writing mirrors language, however variously, it provides us with clear evidence of linguistic change in gross outline (Antilla 1989). Even taken abstractly, writing systems do give clues as to how words were pronounced and what sounds were contrastive in the language at a given stage. Sets of letter combinations that express the same pronunciation can be found and contrasted with others, giving a good idea of the distinctive sounds of the language. This does not, however, tell us what the exact pronunciation was.

Working with written texts is further complicated by issues related to textual transmission, as discussed in Section 1.3.1. Depending on the length and complexity of the textual tradition and how careful the scribe(s) or author(s) were, a text may contain many errors. This is particularly the case when the manuscripts are copied by non-native speakers, as was sometimes the case with the lyric poetry chansonniers. The problem is much less extreme for the prose texts, especially the charters and other administrative texts, because the manuscripts we have of these texts are often not copied at all, but are assumed to be the originals and are preserved in the location in which they were written.

The problem of exactly what is transmitted and its relation to what is actually said, however, remains for in general and for studies of Old Occitan texts in particular. In any case, questions of who authored the texts, and why and when, interact with questions of textual transmission and scribal practice; the interpretation of

41

historical data from texts, then, requires a great deal of information to be certain, which we often do not have (Bishop et al. 2007).

1.8.2 Authenticity

Any feature or construction found in historical texts is subject to questions of authenticity, that is, whether its use corresponds to actual spoken language usage and to some part of the grammar internalized by native speakers. The best evidence of authenticity would, of course, be native speakers themselves. In synchronic linguistics, experiments draw on the judgments of native speakers. Similarly, if there were native speakers who could be asked to identify and use the structures whose authenticity is in question, the question of the status of those features and structures could be put to rest.

Speakers could tell us whether these features have any linguistic reality at all, are marginal, or are a central part of the language. But the “dead languages” studied by historical linguists, by definition, have no native speakers. The authenticity of a feature or structure that is found in the historical data must then be discovered and defended in a different way.

One common strategy for avoiding the possibility that constructions, especially syntactic constructions, are literary conventions is to give greater weight to, or consider exclusively, prose texts. This has not been the case in Old Occitan, where the focus has instead been on the lyric poetry. This strategy has been used, however, in studies of other language in which poetry has been considered less authentic, such as Old English. This is based on the assumption that the “poetic process itself involves stretching grammatical

42

and lexical boundaries”, which can be useful for exploring the limits of the grammar of a language, but not for validating the status of a given feature (Joseph 2000, p. 312). In the same volume, Herring et al. (2000) point out that translated texts are considered least authentic, verse texts slightly more so, and prose texts which reflect the spoken language of the time most authentic and desirable – but even this is limited, as textual languages are written and can provide no direct evidence of spoken communication (see Section

1.8.1). Hock (2000), however, shows that this approach based on the type of text has its own pitfalls, and argues that the full range of syntactic variants can be found and accurately analyzed only by considering the largest possible variety of texts. This thesis aims to avoid this pitfall and considers the entire Old Occitan corpus, covering all types of text in a language which is preserved in a wide variety of text types and genres.

Joseph (2000) examines different evidence for authenticity: if a feature or a construction can be corroborated by other evidence, it can be accepted as an authentic feature of the language of the writer, rather than an error or literary convention. The most readily available evidence is usually the frequency and systematicity of the attestation of the feature. Joseph (2000) uses the example of the substitution of the letter upsilon where

Classical Greek has omicron-iota diphthongs, which is widespread during the Hellenic period (300 B.C-300 A.D.) and confined to only these two vowels; the upsilon is not used in the place of other vowels. “Reverse spellings”, in which the omicron-iota diphthong sometimes occurs where classical Greek has upsilon, provide further evidence. The evidence confirms that the spelling of upsilon where an omicron-iota would be expected

43

must be taken seriously, and that these spellings show that the sounds represented by these sounds were merged in this period (Joseph 2000).

If by “authentic”, what is meant is recording actual speech, very few, if any, of the texts in the Old Occitan corpus could be considered authentic. Instead of focusing on faithfully recording actual speech, however, the notion of authenticity should properly refer to whether the text was a viable linguistic production, written by native speakers and able to be understood by native speakers. Fundamentally, the texts that we have represent authentic linguistic productions of speakers, and the language found in these texts should thus be considered authentic. It is important to note, however, that context- specific practices, for example, in the charters, notarial texts, and certain kinds of poetry, can produce unusual patterns within the corpus. Though these texts may include features which are exceptional in various ways, they are all authentic linguistic productions.

Critical editions can present a problem for authenticity, as some critical editions combine features and constructions of different texts from different dialects or time periods into a single text that, as a whole, was not the production of any one speaker or author. A critical edition created in this way, if considered as a whole, may not be a fully authentic text, but the features found within it are generally taken from some text, and thus, on a more local level, even the features found within these texts should be considered authentic.

44

1.9 Variation in Old Occitan Texts

Like most languages before standardization, Old Occitan is awash in variation. In arguing for a variationist view of language change, Milroy (1992) writes concerning

Middle English (ME):

One of the advantages of studying ME is that its written forms are highly variable…not only is there considerable divergence between different texts, there is also normally great variability (particularly in spelling and inflectional forms) within the texts. Thus, ME language states, being so variable, should in principle be suited to the same kind of analysis that we use in present-day social , and by using variationist methods we should be able to explore at least some of the constraints on variation that might have existed in ME. (p.131)

The same advantage Milroy attributes to the study of Middle English can also be clearly attributed to the study of Old Occitan. There is a great deal of variation not only between texts, but also within each text, as is clear from even a superficial look at Old Occitan texts.

Some of the variation simply reflects orthographic differences, such as the and spellings for the palatal [ʎ], while other variation points to dialect differences, changes in progress, individual variation, and the influence from other languages via non-native speakers. The challenge, of course, is distinguishing among these kinds of variation. Using the full extant corpus of Old Occitan, including the charters and other legal texts, which can often be geographically located more securely, it should be possible to tease apart some of the dialectal and individual variation. In addition, if, as suggested above, poetry is more archaic than prose, a systematic comparison of the text types may be invaluable in teasing apart stable variations and changes in progress.

45

It is easy to approach Old Occitan, or similar languages, by attempting to reduce and make sense of the variation. Fleischman (2000), however, points out that authentic medieval texts, specifically, are heterogeneous at multiple levels: inconsistent and variable , variable morphosyntax, text type- and genre-specific features, and dialectal variation are present across different manuscripts of the same text and within a single manuscript. She criticizes the attempts to uncover a fictional structural homogeneity of Old French as being driven by ideologies of homogeneity and that are simply not warranted, and that give a faulty description of both Old

French itself and the causes and effects of the variation in the Old French texts. Though fewer attempts have been made to show homogeneity in Old Occitan, the causes and effects of the variation in Old Occitan texts is not yet well understood.

According to introductory books on reading and singing troubadour poetry aloud, all of the letters on the page are to be taken literally and pronounced (Paden 1998,

Wayland 1982). If we take the spelling at all seriously, however, we quickly come to the conclusion that there is an incredible amount of variation in the possible pronunciations of words. In many cases, though, the spelling variations found in Old Occitan are no more extreme than the phonetic variation in speakers of modern-day English; context allows a reader to determine what the word is. Nasals are particularly variable; and

are sometimes used interchangeably, omitted, or added superfluously. In these cases, it seems quite possible that the nasal variability reflects real variability in the language. It is relatively clear that this is stable variation because of the consistent frequency of occurrence throughout the period, at least in the lyric poetry texts which

46

have been analyzed, and the lack of observed change between Old Occitan and the dialects of Modern Occitan (Anglade 1977, The Provençal Database).

Some cases of variation are very complex. The palatal and , for instance, can be represented by a bewildering array of letters, and scholars have claimed with some confidence that some spellings and uses represent geographical variation while others represent change in progress (Anglade 1977). In addition, aspects of social variation in historical data are likely to be important sources of variation but are rarely overtly recognized and are very difficult to tease apart from regional variation and individual scribal practice. While these different sources of variation make any study of the phonology of Old Occitan complex, in addition to variation introduced by the metrics of lyric poetry, the study of the full corpus and the comparison of text types will help distinguish different types of variation.

1.10 Dissertation Overview

The purpose of this dissertation is to consider whether there are differences in the language patterns used in three types of text: lyric poetry, non-lyric poetry, and prose and, if differences are found, to consider how significant these differences are. In order to do this, I consider linguistic features of Old Occitan both qualitatively and quantitatively in the Concordance de l’Occitan Médiéval. The three features considered represent different areas of grammar. The first is morphosyntactic: the formation of comparative adjectives as either a word or a phrase (Chapter Two). From there, I consider a case of word formation with the derivation of adjectives (Chapter Three). The final linguistic

47

feature discussed in this thesis is a of the Old Occitan period: the diphthongization of mid vowels (Chapter Four); this discussion of diphthongization represents a continuation of my previous work on the development of this sound change

(Wilson 2010). This phonological feature, which probably occurred not long before the start of the Old Occitan period, will allow a consideration of whether text type affects the attestation of innovative variants. In addition to comparing patterns of usage in lyric poetry, non-lyric poetry, and prose, I consider the texts based on the date and dialect of the text in order to determine to what extent these parameters affect the patterns of usage.

48

Chapter Two: Comparative Adjectives

2.0 Comparative Adjectives

The first feature to be investigated is the expression of the comparative degree in adjectives. Like English and several other Indo-European languages, Old Occitan has two methods for forming comparative adjectives. This chapter considers the pattern of usage of the analytic and synthetic formations of comparative adjectives.

2.1 Comparative Adjectives in Grammars and Descriptions of Old Occitan

The most common method of forming comparative adjectives, shown in (1), is a simple analytic formation of plus ‘good’ followed by the positive form of the adjective.

(1) plus gen ‘more noble; nobler’

The analytic formation with plus was inherited from Latin, occurring from the time of

Plautus, though it became much more common in the period (Anderson and

Rochet 1979, Lausberg 1972).In addition to analytic comparative adjective formations with plus, Anglade (1977) notes that the Leys d’Amors, a mid-fourteenth century set of grammatical rules for writing lyric troubadour poetry, argues that the analytic 49

comparative can also be formed with mais, inherited from Latin magis. Old Occitan is the only Romance language to retain analytic comparatives with both mais and plus

(Lausberg 1972, p. 82). Though the analytic formation with magis was more common in

Latin than that with plus, formations with magis are rare in Old Occitan. When they do occur, they follow the same pattern as those with plus. Therefore, where found, they have been included with the formations using plus as a single category of analytic comparative formations. Otherwise, the analytic comparative formation, as noted by Jensen (1976), presents little of interest, especially from a morphological point of view.

Nouns and adjectives in Old Occitan are inflected for two cases: the nominative case, which was generally used for the subject of a sentence, and the , which was used for the direct object, the object of a preposition, and, less frequently, possession. This two-case system represents a simplification of the case-system of Latin, partially as a result of sound changes which regularly deleted final vowels that once marked Latin cases such as the ablative. Nouns and adjectives are also inflected for number, and adjectives are further inflected for gender, though nouns in Old Occitan decline in a somewhat different pattern from that of adjectives. The basic adjective are given in (2), with the examples of bon ‘good’, which forms the feminine with -a and tal ‘such’, which forms the feminine without the addition of -a.1

1 There are also further sub-classes of adjectives, such as that represented by fals ‘false’, which is invariant in the masculine. 50

(2) Adjective (from Paden 1998)

bon ‘good’ Masculine Feminine Singular Plural Singular Plural Nominative bons bon bona bonas Oblique bon bons bona bonas

tal ‘such’ Masculine Feminine Singular Plural Singular Plural Nominative tals tal tals tals Oblique tal tals tal tals

The other method for forming comparatives is a synthetic formation. The synthetic comparative adjectives are formed by adding a different suffix to the root for each of the cases, -er in the nominative case and -or in the oblique case.2 This formation is shown in (3).

(3) genser (nominative) / gensor (oblique) ‘more noble; nobler’

It is important to note that while there are many adjectives which form their comparatives analytically but not synthetically, the reverse is not true. The analytic comparative formation is available for all adjectives, while the adjective formation is restricted to only a small group of adjectives and, for some of those adjectives, is further restricted, as

2 The case distinction as described here is simplified. Some adjectives use the oblique suffix -or for both cases. 51

shown below. The quantitative analysis in this chapter focuses only on the set of adjectives for which both an analytic form and a synthetic form are available.

The positive and synthetic comparative forms of this subset of adjectives have been given by Anglade (1977), Crescini (1926), Grandgent (1905), Lausberg (1972), and others in their discussions of the comparative adjectives of Old Occitan. The tables given in these grammars and descriptions are fairly similar, although not all adjectives are included in all grammars. Table 2.1 is largely based on Grandgent’s (1905) grammar, which is arguably the most complete. Table 2.1 does, however, include forms missing from his grammar but included in others. In this table, <*> signifies that the form given as the direct ancestor of the Old Occitan is unattested in Latin and is instead a reconstructed Vulgar Latin form; square brackets indicate the form from which the Vulgar Latin is derived. Parentheses indicate that the synthetic comparative form is not built on the listed positive form but is suppletive. The suppletive forms are listed after those formed on the same root; the table is otherwise alphabetical by the Old

Occitan positive. This order will be used for all tables of comparative adjectives in this chapter. Dashes in the columns of synthetic forms indicate that the expected form is not attested in the Old Occitan corpus.

52

Table 2.1 Positive and Synthetic Comparative Adjectives (based on Grandgent 1905)

Old Old Occitan Old Occitan Gloss Latin Occitan comparative comparative positive nominative case oblique case ‘high’ altus aut – ausor ‘beautiful’ *bellatus bel bellaire/bellzer bellazor ‘strong’ fortis fort – forsor ‘noble’ genitus gen genser gensor ‘great’ grandis gran – gragnor ‘heavy’ *grevis greu greuger – [>gravis] ‘great’ grossus gros gruesser – ‘dirty’ laið3 laiger – ‘generous’ largus larc – largor ‘easy’ levis leu leuger – ‘long (in time)’ longus lonc – lonhor ‘bad’ nugalis nualhos – nualhor ‘poor; bad’ sordidus sorde sordier sordejor ‘good’ (bonus) (bon) melher melhor ‘great’ (grandis) (gran) majer major ‘bad’ (malus) (mal) pejer pejor ‘much’ (multus) (molt) – plusor ‘little’ (paucus) (pauc) menre menor

While the analytic comparative formation presents little of interest morphologically, the synthetic comparative adjectives have received significant attention

3 This is the form given by Grandgent as the Latin ancestor of the Old Occitan lai, which is also spelled laid, laig, or lait. Other are given elsewhere: Anglade (1977) gives Vulgar Latin *laidus and Raynouard’s Lexique Roman gives the *lædere, presumably from a participle or deverbal formation. 53

in grammars of Old Occitan because they are the only adjectives in which an imparisyllabic inflection has survived into Old Occitan (Crescini 1926, Jensen 1976).

This inflectional distinction, however, is being lost during the Old Occitan period as analogy aligns the synthetic comparative adjective declension with that of the so-called third adjectival declension. The effect of this analogical change is to reduce the imparisyllabic declension, which is unattested elsewhere in Old Occitan adjectives, and the change was likely motivated by the similarities between the two declensions, especially their minimal difference between the masculine and feminine forms (Jensen

1976). After the analogy has taken place, the synthetic adjectives no longer use the -or and -er endings to distinguish case, but instead use the alternation between [s] and [Ø] to differentiate case as well as number and gender, as is the case with other adjectives shown in (2) above. In other words, the case markings for the synthetic comparative forms are in flux during the Old Occitan period, and this has drawn significant attention to these forms in grammars of Old Occitan.

In terms of the form of the synthetic comparative adjectives, almost all of them can be derived from the Latin synthetic comparative adjective forms through the sound changes that occurred during the centuries between Latin and Old Occitan more simply than they can be transparently created during the Old Occitan period by adding a suffix to the attested form of the adjective. Grandgent (1905) notes in his discussion of the synthetic comparative adjectives that these forms “preserve their old comparative in -ior”

(p. 96). Though some of the synthetic comparative adjective formations are derived from

Vulgar Latin forms rather than from the familiar Classical Latin forms, in Old Occitan,

54

these forms are relics of a productive affixation. The derivation of the synthetic comparative adjectives from the Classical Latin or Vulgar Latin ancestors through sound change and analogy is discussed in detail by Jensen (1976) and Grandgent (1905), so I do not treat them in detail here.

The grammars also note that some of the synthetic comparative adjectives, such as melhor and major, are very frequent, while others, such as bellazor and leuger, occur very rarely. It is not clear from these descriptions, however, whether this means that the comparative usage of the adjectives bel and leu are rare, or whether the comparative forms of these adjectives are more often the analytic formations plus bel and plus leu.

That is, is it the case that these adjectives simply rarely occur in a comparative formation? Or is the analytic comparative formation common while the synthetic comparative formation is rare? When the data from the COM is considered, there are clear examples of both kinds of lack of frequency. Some adjectives, such as nuallos, are very infrequently found in any comparative construction, analytic or synthetic. Other adjectives, such as bel, are frequently found in analytic comparative adjective formations but rarely in synthetic comparative adjective formations. These frequency differences are discussed in detail in the sections that follow.

Interestingly, ausor is described by Anglade (1977) as being one of the more frequent synthetic comparative adjectives, but by Jensen (1976) as being an infrequent synthetic comparative adjective, of which only “vestiges” are found in Old Occitan

(p. 119). Though these accounts seem contradictory, the explanation for the inconsistency

55

in descriptions of the use of ausor becomes clear when the usage of comparative adjectives by text type is considered, as discussed below in Section 2.5.

This chapter considers the analytic and synthetic comparative adjective formations quantitatively, breaking down the usage of comparative adjectives by date, dialect, text type, and adjective. As a consideration of the grammatical literature on Old

Occitan shows, this chapter represents the first time such quantitative work has been done, as well as the first time that factors such as date and text type have been considered in conjunction with the comparative adjective formation. This close examination gives a much better picture of the use of comparative adjectives and the distribution of synthetic and analytic adjective formations in Old Occitan. Before breaking down the usage of comparative adjectives in this way, however, I present a quantitative analysis of comparative adjective usage in the entirety of the Concordance de l’Occitan Medieval.

2.2 Comparative Adjectives in the Concordance de l’Occitan Medieval

2.2.1 Methodology

In order to test the descriptions of the formation of comparative adjectives and their relative frequency of usage, I searched the COM exhaustively for tokens of the comparative adjectives for which both an analytic formation and a synthetic formation were available. First, I searched for all spelling variants of the synthetic comparative adjectives in Table 2.1. In addition, I searched for any other possible synthetic comparative formations ending in -er or -or. This search simply gave the list of all words

56

which ended in or . All forms ending in or that were not comparative adjectives were then removed from the list. For example, many Old Occitan nouns are formed with an -ier suffix that can denote the agent of an action, as well as an abstraction. When words such as these were removed, the remaining list included only the adjectives listed in Table 2.1 above. No examples were found in which the comparative suffix is added to an adjective root not included in Table 2.1, which gives additional proof that the synthetic comparative adjective formation is not productive. The unproductive nature of the synthetic comparative adjective formation is discussed further below.

All of the synthetic forms of each adjective were put into a single category, including spelling variants and the two case forms. The case forms were collapsed for three reasons. First, this made them much easier to quantitatively compare to the analytic adjective formation, as the formal case distinctions do not mark the same categories in the two set of adjectives. The basic adjective declension of bon ‘good’, given as part of

(2) above, is reproduced here as (4), showing how identical forms appear in the nominative and oblique cases.

(4) Adjective Declension of bon ‘good’ (from Paden 1998)

Masculine Feminine Singular Plural Singular Plural Nominative bons bon bona bonas Oblique bon bons bona bonas

57

For masculine adjectives, for example, the nominative singular and the oblique plural are identical, while the nominative plural and oblique singular are identical. For feminine adjectives, the nominative and oblique case forms are identical in both the singular and the plural. This declension pattern in the positive adjective makes it impossible to quantitatively analyze the case forms separately by simply looking at the forms. Because the formal differences in the positive and synthetic comparative adjectives mark different distinctions within the Old Occitan grammatical system, it was simplest to combine them into a single measure of tokens of each formation.

A second reason to combine the case forms in this analysis is that even some of the more common synthetic comparative adjectives do not have a formal case distinction.

Ausor ‘higher’, for example, appears with only the -or ending that is formally described as the oblique case, except in two examples from the Regles de Trobar discussed in

Section 2.5 below. The word occurs frequently, however, in both nominative and oblique syntactic positions. Finally, the case forms were collapsed because, even for the synthetic comparative adjective formations that have a formal case distinction, this case form distinction is not always followed. In addition, because of the analogy described in the previous section, the use of the -er and -or endings to distinguish the case of synthetic comparative adjectives is being lost during the Old Occitan period. The nominative and oblique case forms are therefore impossible to distinguish based on the form of the adjective.

58

Once all of the synthetic comparative adjectives in the corpus were found, I searched for all attestations of those adjective roots being used in an analytic adjective formation. This was done by pulling out all cases of plus4 and mais and simply searching by hand for the cases in which any of the adjectives which have a synthetic comparative adjective formation were used with plus to create an analytic comparative adjective.

2.2.2 Results

In the entire COM corpus, 8,726 tokens of comparative adjectives for which both the analytic formation and the synthetic formation were available were found: 1,508 analytic formations, 7,203 synthetic formations, and 15 doubly marked comparative adjectives. The doubly marked comparative adjectives are discussed in Section 2.2.3 below. Taken as a whole, then, the synthetic comparative adjective formation is used in

82.55% of the relevant comparative adjective formations in Old Occitan, much more frequently than the analytic comparative adjective.

This figure is misleading, however, as it gives the impression that the synthetic forms are consistently widely used. In fact, not all of the adjectives in question follow this pattern, and the extremely high frequency of some of the adjectives compared to others skews the distribution of analytic and synthetic comparative adjectives.

Considering each adjective individually is much more revealing. Rather than finding a consistent percentage of synthetic comparative adjectives around 82%, the percentages vary from 1.64% to 100.00% when each adjective is considered separately. The

4 As well as all of the possible spelling variants of plus: pluz, pluss, pluus, plius, pus, etc. 59

distribution of each individual adjective in the COM corpus is given in Table 2.2. The oblique case form of the synthetic comparative adjectives is used here for reference, even if the synthetic comparative adjective does not occur in this form in the texts.

60

Table 2.2 Comparative and Synthetic Adjectives in the Entire COM Corpus

Positive Percentage Adj. / Gloss Analytic Synthetic Doubly Total Synthetic Comp. Adj. Forms Forms Marked Forms Forms aut/ ‘high’ 203 115 2 320 35.94% ausor bel/ ‘beautiful’ 229 50 0 279 17.92% bellazor fort/ ‘strong’ 359 6 1 366 1.64% forsor gen/ ‘noble’ 79 294 2 375 78.40% gensor greu/ ‘heavy’ 73 4 0 77 5.19% greugor gros/ ‘great’ 27 9 0 36 25.00% greussor laitz/ ‘dirty’ 11 2 0 13 15.38% laigor larc/ ‘generous’ 38 20 0 58 34.48% largor leu/ ‘easy’ 94 3 0 97 3.19% leuger lonc/ ‘long’ 105 22 1 128 17.18% lonhor nuallos/ ‘bad’ 0 2 0 2 100.00% nualhor sorde/ ‘poor’ 0 50 1 51 98.04% sordejer bon/ ‘good’ 25 1,833 4 1,862 98.44% melhor gran/ ‘great’ 217 3,0065 2 3,225 93.20% major mal/ ‘bad’ 39 287 1 327 87.76% pejor molt/ ‘much’ 1 517 0 518 99.81% plusor pauc/ ‘little’ 8 1,034 2 1,044 99.04% menor Total 1,508 7,203 15 8,726 82.55%

5 Of these forms, one is gragnor, the outcome of the inherited synthetic form Latin grandior. All other instances, which comprise the vast majority of the comparative forms of gran, are forms of the suppletive comparative major. 61

It is immediately clear that the full distribution of analytic and synthetic comparative adjective formations cannot be accurately described by the overall proportion of analytic and synthetic comparative formations given above. Instead, each adjective has a different relative frequency of analytic and comparative uses. The comparative of some adjectives, such as sorde, only occurs in the synthetic form. The analytic comparative plus sorde is not found anywhere in the COM, though the positive adjective sorde is itself attested several dozen times. Other adjectives, such as fort

‘strong’, greu ‘heavy’, and leu ‘easy’, occur far more often in analytic comparative adjective formations than in synthetic adjective formations. Still others, such as aut

‘high’, occur in both analytic and synthetic comparative formations more equally. These word-by-word differences between the adjectives give further support to the claim that the synthetic adjective formation is not a productive process in Old Occitan. If it were, we would expect to see a more even distribution of analytic and synthetic uses across different adjectives. Instead, the pattern of comparative adjective use is strongly lexicalized.

Although some grammars (e.g. Paden 1998) list the comparative “suffixes” -er and -or (and -azor for bellazor) among their lists of morphemes, the synthetic comparative formation in Old Occitan does not seem to be formed by adding these suffixes to the Old Occitan adjective roots. Doing so would not directly yield the synthetic forms found in Old Occitan. For example, the addition of the comparative suffix ‒or to the Old Occitan adjective aut ‘high’ would give the synthetic comparative adjective *autor rather than the attested synthetic comparative ausor unless the addition

62

of the synthetic comparative adjective “suffix” caused a morphophonemic alternation of the stem. Autor never occurs in the Old Occitan corpus with the meaning ‘higher’. More importantly, however, as noted in the previous section, the synthetic comparative adjective formation is restricted to the eighteen adjectives in Table 2.1.6 No other adjectives are used with synthetic adjective formations in the Old Occitan corpus, as mentioned above. This lack of new word formations with the synthetic comparative adjective “suffixes” -er and -or clearly shows lack of productivity of this pattern. One measure of productivity is the ratio of tokens to types; productive morphemes and processes have a large number of types with very low frequency, often including many hapax legomena. Unproductive morphemes and processes, on the other hand, have a smaller number of types, often with larger token frequency. The synthetic comparative adjective formation has a very small number of types, only eighteen, but many of those types have a large number of tokens. While there are some types with a low token frequency, particularly the synthetic comparative formations of nuallos and laitz, they seem to be obscure forms rather than forms that are newly created by a productive process. In the case of nuallos and laitz, for example, not only are the token frequencies of the synthetic comparative adjective formations low, but the token frequencies of the analytic adjective formations and, indeed, the token frequencies of the positive adjectives themselves are low. By any measure, the synthetic adjective formation seems to be unproductive, though still firmly entrenched in the lexicon.

6 Table 2.2 and subsequent tables in the chapter have seventeen rows rather than eighteen because the two synthetic comparative forms for gran ‘great’ are combined in a single row. This was done because both synthetic comparative formations alternate with the analytic comparative formation plus gran. 63

It is also important to note that many of the synthetic comparative adjectives are suppletive and have a very high frequency. Suppletive forms, in most theories of morphology, are stored separately rather than created by processes such as affixation.

Their status as inherently stored forms makes it more likely that these forms will be kept when the productive process of synthetic comparative adjective formation in -ior in Latin is lost. When that process is no longer productive, any synthetic comparative forms that remain must be stored. Because the suppletive forms are already stored, there is no change in the creation or usage of their forms.

The high token frequency of these forms reinforces the irregular pattern of the synthetic comparative adjective formation, once it becomes an irregular pattern rather than the regular productive method of forming comparative adjectives. This is similar to the way in which high-frequency English words with irregular past tenses such as eat–ate and sleep–slept have retained their irregular past tenses through generations of speakers, while less frequent verbs such as weep–wept/weeped have lost their irregular past tense patterns and developed regular past tense endings in -ed (Bybee 2006). Bybee uses an exemplar model of grammar to propose that “frequency strengthens the memory representations of words or phrases, making them easier to access and thus less likely to be subject to analogical reformation” (2006, p. 715). In the case of the Old Occitan synthetic comparative adjective formation, for those synthetic comparative forms that occur with high token frequency, this high token frequency reinforces the synthetic formation and makes them less likely to be replaced by the analytic comparative adjective formation.

64

The two methods for forming comparative adjectives, then, are not competing word- and phrase- formation processes, but instead present a competition between stored lexical forms and a phrase formation. Even though there are large differences in the relative frequencies of analytic and synthetic comparative formations of different adjectives, there are other factors that could influence the distribution of analytic and synthetic forms. In the remainder of this chapter, I consider three of those factors: dialect, date, and text type, in Sections 2.3, 2.4, and 2.5, respectively.

2.2.3 Doubly Marked Comparatives in the COM

Some of the most interesting attestations of comparative adjectives are those that are doubly marked, using both the synthetic and analytic formations in one token. An example from a later thirteenth-century text is given in (5).

(5) Item la plus mayor7 ley que lo senhor deu avec de sons cavers… (FDBE) The greatest obligation the lord owes to his knights…

Skårup (1997) notes that the synthetic comparative formations can themselves be preceded by plus, and that, in these cases, the synthetic comparative forms are to be treated as positive adjectives rather than comparative adjectives. Jensen (1976) takes the existence of these doubly marked forms as evidence that the synthetic comparatives are losing their comparative force during the Old Occitan period. Doubly marked

7 Mayor is a spelling variant of major, the suppletive synthetic comparative formation of gran. See Section 1.9 for spelling variation in manuscripts. 65

comparative forms do occur in English, however, as well as in modern Romance languages such as French, Spanish, and Italian (Tekavčić 1972). In these languages, the fact that speakers occasionally use constructions such as English more better does not mean that the English synthetic comparative better is losing or has lost its comparative force. Doubly marked comparative forms are sometimes used for emphasis or used playfully in English; for example, one of the most famous double superlatives, “This was the most unkindest cut of all”, comes from Shakespeare’s .

The same uses occur in Old Occitan. In the example in (5), the use of plus mayor rather than plus gran or mayor indicates emphasis. Other “great” items occur in this administrative logbook, each described as gran or major, but this one is even greater than the others. In addition, one third of the doubly marked comparative tokens occur in the lyric poetry, which accounts for only ten percent of the COM corpus. Because the lyric poetry is noted for its word play as well as its strict and inventive meters, this uneven distribution is not surprising. The uneven distribution further suggests, though, that the doubly marked comparatives were used for word play or emphasis, rather than because the synthetic comparatives are losing their comparative force.

Jensen notes specifically that ausor, the synthetic comparative form of aut, has

“ceased to be considered a comparative in most cases” and combines with plus like other positive adjectives to form a comparative (1976, p. 119). Skårup (1997) gives ausor the same treatment. Looking at the data from the COM corpus, however, ausor occurs with plus only twice, only slightly more often than other synthetic formations. A Fisher’s

66

exact test8 shows that the difference in relative frequency is not statistically significant

(p = 0.1029).9 Both of these tokens, given in (6) and (7), seem to be the same usage as the other doubly marked comparatives, showing emphasis or extremity. In (6), the are the very highest and most important, while in (7), an extreme degree of sanctity is attributed to the Virgin Mary.

(6) @En @Dragonetz apela lo comte so senhor, e foron al cosselh li baro plus ausor: (CCA) Sir Dragonetz called the and the highest barons were assembled for counsel:

(7) Sus en l’onrat heretatge on so li sanhtor — la @Dieu me done salvatge! — el gra plus aussor a quis la Verges Maria, (PC 401 007) Up in the honorable domain where there is sanctity, ‒ may God grant me salvation! ‒ the highest degree of which belongs to the Virgin Mary,

In the COM, then, we do find some cases of doubly marked comparatives, but they seem to be used for word play or, more commonly, to show extremity in description. While they do present interesting cases of comparative adjective use, they do not give compelling evidence that the synthetic comparative adjective formations had lost, or were

8 For the Fisher’s exact test, see Fisher (1954). 9 For the purposes of this thesis, p = 0.01 is used as the threshold for statistical significance. 67

losing, comparative adjective force during the Old Occitan time period. Instead, they are used in the same ways that speakers of modern languages use doubly marked comparatives.

2.3 Analysis of Comparative Adjectives by Dialect of Text

In this section, I consider the use of analytic comparative forms by the dialect of the text. Only non-lyric poetry and prose texts with moderately secure locations are included in this analysis, which account for approximately one third of the total corpus, and these texts were divided into seven dialects, as described in Section 1.7.2. In each group, all instances of comparative formations of the adjectives which participate in the synthetic comparative adjective formation were noted. Table 2.3 shows the number and percentage of synthetic forms of each adjective in each dialect; each row in the table shows the synthetic comparative adjective forms of a single adjective. For example, there were two comparative uses of aut in Alpine Provençal texts and neither were synthetic forms; both attestations were analytic formations. Doubly marked comparative forms are not counted in this table.

68

Table 2.3 Synthetic Comparative Adjective Forms by Dialect of Text.

Alpine Auvernhat Gascon Languedoc Limousin Provençal Waldensian

Provençal aut/ 0/1 0/1 0/15 1/10 __ 0/21 0/10 ausor 0% 0% 0% 10% 0% 0% bel/ 0/3 __ 2/6 0/11 __ 0/9 0/3 bellazor 0% 33.3% 0% 0% 0% fort/ 0/1 __ 0/16 0/26 __ 0/8 0/19 forsor 0% 0% 0% 0% 0% gen/ __ __ 1/2 1/2 ______gensor 50% 50% greu/ __ __ 0/2 0/3 __ 0/3 0/7 greugor 0% 0% 0% 0% gros/ 1/1 __ 0/6 0/1 __ 0/2 0/1 greussor 100% 0% 0% 0% 0% laitz/ ______laigor larc/ __ __ 0/3 0/1 __ 0/11 __ largor 0% 0% 0% leu/ __ __ 0/1 0/5 __ 0/1 __ leuger 0% 0% 0% lonc/ 0/1 1/1 1/9 0/4 __ 4/12 0/6 lonhor 0% 100% 11.1% 0% 33.3% 0% nuallos/ ______nualhor sorde/ ______1/1 1/1 __ __ sordejer 100% 100% bon/ 21/21 3/3 136/136 71/72 2/2 31/31 48/48 melhor 100% 100% 100% 98.6% 100% 100% 100% gran/ 7/9 18/22 655/678 223/242 4/4 113/114 122/125 major 77.7% 81/8% 96.6% 92.1% 100% 99.1% 97.6% mal/ 0/1 0/1 2/6 3/4 1/1 __ 32/32 pejor 0% 0% 33.3% 75% 100% 100% molt/ 2/2 9/9 58/58 43/43 0/0 22/22 52/52 plusor 100% 100% 100% 100% 0% 100% 100% pauc/ 2/2 2/2 66/67 78/78 1/1 44/44 37/37 menor 100% 100% 98.5% 100% 100% 100% 100%

69

None of the rows in Table 2.3 shows any clear pattern in the number or percentage of synthetic forms used when analyzed by dialect. In many instances, particularly in the

Alpine Provençal, Auvernhat, and Limousin dialects, there are either no occurrences of any comparative formations of an adjective or so few occurrences that no conclusions can be drawn from the data. Even the most frequent comparative adjectives occur relatively rarely in these dialects, because so few texts can be securely located to these dialect areas. By contrast, the Provençal, Gascon, and Languedoc dialects include many more examples of comparative adjectives. In these cases, particularly in the adjectives with the highest frequency in the lower rows of the table, we can see that the dialect has little influence on the number and percentage of synthetic forms used.

For example, the percentage of the synthetic comparative of bon, melhor, differs by just over two percent between the dialect with the highest percentage of comparatives

(100% in all but two dialects) and the dialect with the lowest percentage of synthetic comparatives (97.9% in Waldensian). Similarly, the synthetic comparative forms of pauc, menor, occur in at least 98.5% of all comparative uses of pauc in each of the seven dialects. The synthetic comparative of aut, ausor, on the other hand, occurs only once, in a text from the Languedoc dialect area, and the synthetic comparative of greu, greugor, does not occur in any of the dialects. The consistency of the percentage of synthetic comparative adjective formations across all of the dialects of Old Occitan shows that the dialect or location of the text has no impact on the use of comparative adjectives.

The adjectives which seem to have the most variation, for example, gros, varying from 0% to 100%, have so few tokens that no conclusions can be drawn from this

70

variation. The only occurrence of the synthetic comparative of gros occurs in Alpine

Provençal, where this occurrence is the only occurrence in that dialect of the comparative of gros at all. While this could indicate that synthetic forms are more prevalent in Alpine

Provençal, a look at the other comparative adjectives in this dialect shows that this is unlikely to be the case. The percentages of synthetic comparative forms of the other adjectives are in line with the percentages in the other dialects. The synthetic comparative form of gros, then, while unexpected, stands out only because the number of comparative adjective tokens of gros in Alpine Provençal is so very low.

It is clear, then, that dialect has no effect on the use of analytic or synthetic forms of comparative adjectives. The few differences we do see either are attributable to the small number of tokens or can be traced to specific characteristics of individual texts within each dialect. In the case of Limousin, for example, the dialect is represented by only four texts. The individual variants and choices made in each text have a much larger effect on the picture of the dialect as a whole than any single text in, for example, the

Gascon dialect, which is represented by forty-five texts. If we had more texts that could be securely located to some of these dialects, the individual differences would not appear as dialect differences. There is, however, no indication from these texts that the dialect of a text plays a role in the use of analytic and synthetic comparative adjectives.

2.4 Analysis of Comparative Adjectives by Date of Text

In this section, I consider the use of analytic comparative forms relative to the date of the text. The texts were analyzed in groups of fifty years and only non-lyric poetry

71

and prose texts with moderately secure dates, which account for 61.2% of the entire COM corpus, are included in this analysis, as explained in Section 1.7.1. In each group of texts, all instances of comparative formations of the adjectives with synthetic forms were noted.

Table 2.4 shows the number and percentage of synthetic forms of each adjective in each time period. For instance, there was one comparative use of bel in eleventh century texts and it was not a synthetic formation, but an analytic one. As in the previous section, double comparative forms are not counted as synthetic forms in this table.

72

Table 2.4 Synthetic Comparative Adjective Forms by Date of Text.

1050- 1100- 1150- 1200- 1250- 1300- 1350- 1400- 1450-

1100 1150 1200 1250 1300 1350 1400 1450 1500 aut/ ______6/14 11/22 15/53 11/43 0/17 6/21 ausor 42.8% 50% 28.3% 25.6% 0% 28.6%

bel/ 0/1 __ 1/2 4/10 4/29 10/54 4/30 0/14 0/19 bellazor 0% 50% 40% 13.8% 18.5% 13.3% 0% 0%

fort/ 0/1 __ 0/0 3/16 0/47 0/77 1/48 0/33 0/30 forsor 0% 0% 18.8% 0% 0% 2.1% 0% 0%

gen/ 2/2 __ 2/3 8/11 3/13 33/43 19/25 0/1 2/4 gensor 100% 66.6% 72.7% 23.1% 78.6% 76% 0% 50%

greu/ ______0/2 0/8 0/8 1/8 0/4 0/1 greugor 0% 0% 0% 12.5% 0% 0%

gros/ ______0/3 4/11 0/3 0/3 1/2 greussor 0% 36.4% 0% 0% 50%

laitz/ ______0/2 0/2 ______laigor 0% 0%

larc/ ______1/5 0/1 1/3 0/3 0/12 __ largor 20% 0% 33.3% 0% 0%

leu/ __ __ 0/1 0/4 0/23 0/12 0/13 0/1 0/2 leuger 0% 0% 0% 0% 0% 0% 0%

lonc/ ______2/6 0/9 2/25 0/4 2/16 0/7 lonhor 33.3% 0% 8% 0% 12.5% 0%

nuallos/ 1/1 ______nualhor 100%

sorde/ __ __ 0/0 8/8 3/3 3/3 6/6 __ __ sordejer 0% 100% 100% 100% 100%

bon/ 2/2 __ 12/12 172/174 172/175 277/280 155/158 83/84 99/100 melhor 100% 100% 98.8% 98.3% 98.9% 98.1% 98.8% 99%

gran/ 4/5 7/7 22/22 91/97 588/602 459/486 328/350 285/299 147/178 major 80% 100% 100% 93.8% 97.7% 94.4% 93.7% 95.3% 82.6%

mal/ 4/4 __ __ 9/10 34/36 35/49 9/12 1/2 11/13 pejor 100% 90% 94.4% 71.4% 75% 50% 84.6%

molt/ __ __ 1/1 37/37 53/53 67/67 27/27 36/36 56/56 plusor 100% 100% 100% 100% 100% 100% 100%

pauc/ __ __ 2/2 32/32 117/118 194/195 125/126 100/100 45/45 menor 100% 100% 99.2% 99.5 99.2% 100% 100%

73

Each row in Table 2.4 shows the synthetic forms of a single adjective though the half- centuries of Old Occitan texts. As in the case of the dialect of the texts, none of these rows show any clear pattern in the number or percentage of synthetic forms used. In many cases, the comparative form of an adjective simply does not occur within a time span, or occurs with so few tokens that no conclusions can be drawn from the data. The earliest time periods, while the most interesting in many ways, are the most problematic in terms of number of tokens. Even the most frequent comparative adjectives have few tokens simply because there are so few texts which can be securely dated to these early periods. In other cases, however, particularly the high-frequency suppletive adjectives given at the bottom of the table, there are enough comparative forms across the time span to see that the date makes little difference in the percentage of synthetic forms used.

Although most adjectives in Table 2.4 are quite consistent across all of the time divisions in which there are sufficient tokens, there are some which warrant further comment. The synthetic comparative of gen, gensor ‘more noble’, occurs only three times out of thirteen tokens of comparative gens in texts between 1250 and 1300. Five of the analytic comparative formations, however, occur in a single text, the Ensenhamen d’Honneur, and this text does not use the synthetic comparative form of gen at all, though the comparative forms melhor, plusor, and major do occur in the text. If we take out this text, the synthetic form is used in three out of eight tokens of comparative forms of gens.

This percentage of 37.5%, while still lower than would be expected based on the other time periods in the row, is still within the expected range, given how low the token

74

number is. A Fisher’s exact test shows that the difference is not statistically significant

(p = 0.1908).

Similarly, the synthetic comparative of aut, ausor, does not occur at all in the seventeen instances of the comparative of aut between 1400 and 1450. Given the percentage of occurrence of this synthetic form in other time periods, its absence here is quite surprising. Because it does occur in the latest time period between 1450 and 1500, this does not seem to be a loss of the synthetic comparative of aut or a trend away from using it, but this figure does need an explanation. When we look more closely at the texts from this time period, we find that the non-lyric poetry texts are not as well represented in this time period as in some others. There are relatively few non-lyric poetry texts in this time period and most of them are very short; the non-lyric poetry accounts for only

5% of the words in this time period. All three comparative uses of aut in the non-lyric poetry are analytic forms, not synthetic forms, and occur in the same religious text, the

Vie de Saint Trophime. The bulk of the texts from this time period are prose texts and, as is discussed below in Section 2.5, the synthetic comparative of aut is very uncommon in prose texts. While it is statistically significant (p = 0.0036), the unexpectedly low percentage of synthetic uses of aut in texts dated between 1400 and 1450 is caused not by the date of the text, but by a combination of text type differences and the individual decisions or personal style of the author of the non-lyric poem Vie de Saint Trophime.

Although the text type differences and the individual decisions of authors are also present in the other time periods, they only seem to surface as a chronological difference in this time period because of the relatively low number of non-lyric poetry texts.

75

Finally, the synthetic comparative of gran, major, occurs 147 times out of 178 tokens, or 82.6%, of the comparative of gran between 1450 and 1500. While high, this number is lower than would be expected based on the other time periods within the row, all of which have a percentage over 90%. The difference in percentage would not be a problem if the number of tokens were low, but it is not. Of the thirty-one tokens of analytic comparatives of gran, however, twenty come from two non-lyric poetry texts, the

Mystère des Saints Pierre et Paul and the Mystère de Saint Pons. From the descriptions of the texts, there seems to be no reason why the use of comparatives of gran in these texts should be anomalous, but the pattern is exactly the reverse of what is found in other texts from this time period and, indeed, across the corpus. The synthetic form major is used once in the Mystère des Saints Pierre et Paul and twice in the Mystère de Saint

Pons, while the analytic form, which is much less common than the synthetic form throughout the COM corpus, is used ten times in each text. While these texts are anomalous in their use of the comparative of gran, they do not seem to represent a trend in the use of synthetic comparatives. All other texts from the time period show the same patterns as those in other time periods. Thus, the use of comparative forms is fairly consistent throughout the Old Occitan period and the COM corpus, though individual authors may use these forms in different or unexpected proportions.

2.5 Analysis of Comparative Adjectives by Text Type

In this section, I consider the use of analytic comparative forms by the text type, separating lyric poetry, non-lyric poetry, and prose. Ricketts’s divisions of these texts

76

into types for the tranches of the COM were used. Table 2.5 gives the number and percentage of synthetic comparatives of each adjective in each type of text; each row in

Table 2.5 shows the synthetic forms of a single adjective.

77

Table 2.5 Synthetic Comparative Adjective Forms by Text Type

Lyric Poetry Non-Lyric Prose Poetry aut/ 58/79 49/103 10/138 ausor 73.4% 47.6% 7.2%

bel/ 29/80 19/139 2/60 bellazor 36.2% 13.6% 3%

fort/ 4/57 2/138 0/169 forsor 7% 1.4% 0%

gen/ 236/273 36/74 22/28 gensor 86.4% 48.6% 78.6%

greu/ 1/29 0/18 2/29 greugor 3.4% 0% 6.9%

gros/ 4/5 2/10 2/21 greussor 80% 20% 10%

laitz/ 2/4 0/5 0/2 laigor 50% 0% 0%

larc/ 5/15 3/9 11/34 largor 33.3% 33.3% 32.3%

leu/ 1/19 0/27 2/43 leuger 5.2% 0% 4.6%

lonc/ 3/17 4/29 15/82 lonhor 17.6% 13.8% 18.3%

nuallos/ 30/30 14/15 6/6 nualhor 100% 93.3% 100%

bon/ 506/517 601/614 726/731 melhor 97.8% 97.8% 99.3

gran/ 27910/298 525/596 2207/2336 major 93.6% 88.1% 94.6%

mal/ 142/145 68/89 77/103 pejor 97.9% 76.4% 74.7%

molt/ 28/28 59/59 430/431 plusor 100% 100% 99.7%

pauc/ 51/51 176/180 807/813 menor 100% 97.7% 99.3%

10 Of these forms, one is gragnor, the outcome of the inherited synthetic form Latin grandior. The other 278 instances, which comprise the vast majority of the comparative forms of gran, are forms of the suppletive comparative major. 78

Unlike in the discussions of the date and dialect of the texts above, here we do find clear differences between text types in some adjectives. For most adjectives, the synthetic comparatives are much more frequent in lyric poetry, and are least frequent in prose texts.

For example, the synthetic comparative of aut, ausor, occurs in 58 of 79 instances

(73.4%) in the lyric poetry but only ten out of 138 instances (7.2%) in the prose texts.

The percentage of usage in the non-lyric poetry texts falls in the middle: 49 out of 103 instances (47.6%). The difference is highly statistically significant (p < 0.0001). The difference in percentages of synthetic forms by text type for several other adjectives is significant as well, notably bel (p < 0.0001), fort (p < 0.005), gros (p = 0.003), and mal

(p < 0.0001). For some adjectives, such as gen, the difference between text types is not statistically significant, but the pattern of usage in the synthetic comparative formation ‒ most frequent in the lyric poetry texts, less frequent in the non-lyric poetry texts, and least frequent in the prose texts ‒ can still be seen, although it is weaker and not statistically significant.

As significant as these differences in distribution between text types seems at first glance, the reality is that they are even greater. The numbers and percentages given in

Table 2.5 for the prose texts include all instances of the comparative forms in the texts, regardless of how they are used. Some of these instances, however, should not be considered uses of the synthetic forms. In these instances, the words are not used but are simply listed. For example, in the Regles de Trobar, a grammatical text from the end of the fourteenth century, we find the passage in (8).

79

(8) Son alcu d’autre nom qui han atressi mudamen pus qu’en alonguar e abreuiar d’una letra, aysi con gensor, qui fay lo nominatiu genser en lo singular, e en lo nominatiu plural diras gensor e en los autres cas diras gensors. Atressi ditz hom en lo nominatiu singular meyler e als autres cas meylor, en lo nominatiu singular diras auser e als altres cases diras ausor; e en lo plural seguisson la maneyra de gensor. Pero entendes la regla que t’ay ditxa ia: li nominatiu plural femeni s’alongen tos temps, per que hom ditz li rey son gensor, meylor, ausor; les reynes son gensors, meylors, ausors. (JDFR; emphasis mine)

There are some other nouns which have more changes than shortening or lengthening by one letter, such as gensor, which is in the nominative singular genser, and in the nominative plural you say gensor and in the other cases gensors. In the same way, one says meyler in the nominative singular and meylor in the other cases, and in the nominative singular you say auser and in the other cases you say ausor; and in the plurals they follow the same pattern as gensor. But according to the rule which I have already told you, the feminine nominative plural forms always lengthen, so that one says that kings are gensor, meylor, ausor, but queens are gensors, meylors, ausors.

In this passage of the Regles de Trobar, we find seven instances of the synthetic comparative gens, gensor, and four instances of the synthetic comparative of aut, ausor.

In addition, two manuscripts of this text are included in the COM corpus,11 and this list of synthetic forms appears in both texts. This one text, then, accounts for fourteen of the twenty-two instances of the synthetic comparative of gens in the prose and eight of the ten attestations of the synthetic comparative of aut. Not only does this clustering of otherwise infrequent forms skew the distribution of synthetic and analytic comparative adjective formations in the prose texts, but the instances included in (8) should not be included in a consideration of the use of the comparative adjectives at all. They represent mentions of the word rather than uses. A word or expression being used involves

11 This, as discussed in Section 1.3.3, is very uncommon and represents an inconsistency in the COM corpus. For the vast majority of texts, a single critical edition or diplomatic edition of each text is used, rather than multiple manuscripts. 80

reference to whatever the expression referred to; a word or expression being mentioned, on the other hand, involves reference to the expression itself (Quine 1940). In the case of the adjectives from the Regles de Trobar in (8), the reference of each of the synthetic comparative adjectives is the expression itself. Instead of being used by an author in a meaningful way, these words are simply mentioned here as the author describes how the comparative adjective ought to be formed for the nominative and oblique, masculine and feminine.

Similarly, in the Donatz Proensals of Uc Faidit, leuger and greuger, the synthetic comparative forms of leu and greu respectively, appear in a list of words whose nominative singular form does not end in -s. Because this text, like the Regles de Trobar, appears in the prose text corpus in two very different manuscripts, this list accounts for all of the attested synthetic comparative forms of leu and greu in the prose texts. Also included in the list are sordejer, pejer, majer, melher, menre, and genser.

Finally, in addition to the instances of the synthetic comparative of aut, ausor, that appear in the Regles de Trobar, yet another attestation of ausor is mentioned, rather than used. The word ausor appears in an Old Occitan-Latin glossary apparently intended for who wanted to write troubadour poetry correctly.

If the cases in which the synthetic comparative forms are mentioned rather than used are removed from the analysis, the distribution of analytic and synthetic formations appears as in Table 2.6. The numbers and percentages which are different from those in

Table 2.5 are shaded.

81

Table 2.6 Synthetic Comparative Adjective Forms by Text Type with Mention Attestations Removed

Lyric Poetry Non-Lyric Prose Poetry aut/ 58/79 49/103 1/129 ausor 73.4% 47.6% 0.7%

bel/ 29/80 19/139 2/60 bellazor 36.2% 13.6% 3%

fort/ 4/57 2/138 0/169 forsor 7% 1.4% 0%

gen/ 236/273 36/74 7/13 gensor 86.4% 48.6% 53.8%

greu/ 1/29 0/18 0/29 greugor 3.4% 0% 0%

gros/ 4/5 2/10 2/21 greussor 80% 20% 10%

laitz/ 2/4 0/5 0/2 laigor 50% 0% 0%

larc/ 5/15 3/9 11/34 largor 33.3% 33.3% 32.3%

leu/ 1/19 0/27 0/43 leuger 5.2% 0% 0%

lonc/ 3/17 4/29 15/82 lonhor 17.6% 13.8% 18.3%

nuallos/ 30/30 14/15 4/4 nualhor 100% 93.3% 100%

bon/ 506/517 601/614 716/721 melhor 97.8% 97.8% 99.3%

gran/ 27912/298 525/596 2205/2334 major 93.6% 88.1% 94.5%

mal/ 142/145 68/89 75/101 pejor 97.9% 76.4% 74.3%

molt/ 28/28 59/59 430/431 plusor 100% 100% 99.7%

pauc/ 51/51 176/180 807/813 menor 100% 97.7% 99.3%

12 Of these forms, one is gragnor, the outcome of the inherited synthetic form Latin grandior. The other 278 instances are forms of the suppletive comparative major. 82

In Table 2.6, it is clear that, while for some adjectives there is a clear trend of usage based on text type, there is not a single pattern which accounts for the usage of all adjectives. For many of the adjectives with both an analytic form and a synthetic form, the synthetic form is much more commonly used in the lyric poetry than in other text types. The synthetic forms of these adjectives are used less frequently in the non-lyric poetry than in the lyric poetry and very rarely in the prose texts. This difference is statistically significant for the usage of the comparative adjective formation of aut

(p < 0.0001), bel (p < 0.0001), fort (p < 0.005), gen (p < 0.0001), gros (p = 0.003), and mal (p < 0.0001). The trend is not statistically significant but is still notable for the adjectives leu and greu, for which the only attestation of the synthetic comparative of each is in the lyric poetry. The non-lyric poetry has a distribution of analytic and synthetic comparative adjective forms that is, for most adjectives, between that of the lyric poetry and the prose texts.

For these adjectives, it seems clear that the use of the analytic or comparative adjective is based, at least partly, on the type of the text. The synthetic forms are associated with the lyric poetry and are very rarely used outside of this tradition. Aside from the patterns of usage themselves, there are two additional indications of the association of the synthetic comparatives with the lyric poetry. The first is that these synthetic comparative adjectives receive explicit attention in grammars intended to teach the reader to write troubadour poetry, such as the Regles de Trobar and the Donatz

Proensals. If these synthetic comparative adjective forms were used in all text types and forms of the language, the writer of a text such as the Regles de Trobar might have

83

reasonably expected his audience to be aware of such forms and not need such a detailed explanation of the forms and their endings. That such an explanation was included demonstrates that the author did not believe the reader would be familiar with the use of these forms, which were almost exclusively used in the lyric troubadour poetry.

The second piece of evidence that the synthetic comparative forms of these adjectives were associated with lyric poetry is the fact that religious texts seem to avoid using them entirely. Almost none of the few examples of the synthetic comparative formation of these adjectives come from religious texts and, as discussed in Section 2.3, the percentage of use of synthetic comparative adjectives in the Mystère des Saints Pierre et Paul and the Mystère de Saint Pons, religious poems from late in the Old Occitan period, is low enough to make the overall percentage for that time period lower than expected. Religious texts in both non-lyric poetry and prose use a significantly lower percentage of synthetic comparative adjective forms than other non-lyric poetry and prose texts from the same time period and dialect. The text type difference that causes a difference in the use of analytic and synthetic comparative adjective forms, then, is more fine-grained than that given in Table 2.6. Not only the text type itself but also the subject matter and the purpose of the text influence whether the analytic or synthetic comparative adjective form is used. Religious texts use the fewest synthetic comparative adjectives, followed by the medical texts. Administrative texts use some of these synthetic forms, but not many. The bulk of the uses of these synthetic comparative adjective forms, however, come from historical texts. In fact, the only attested use of the synthetic comparative form of aut, ausor, where the form is used in a text rather than mentioned, is

84

in the Vidas. The Vidas are a collection of short biographies of troubadours, kings, and other important people of the time period. They are strongly associated with the lyric troubadour tradition, and frequently a troubadour’s vida appears in a manuscript alongside his lyric poems. The relatively high percentage of the use of the synthetic comparative adjective in a text so closely associated with the lyric poetry also highlights the connection of these synthetic comparatives with the lyric poetry. The relative avoidance of these forms in religious works, however, is consistent with the idea that the authors were aware of the association of these words with the lyric tradition and sought to distance themselves from this tradition.

While this pattern of high frequency of synthetic comparative formations in the lyric poetry texts, lower frequency in the non-lyric poetry texts, and very low frequency in the prose texts is very clear for some adjectives, a very different pattern is found for other adjectives. Most of the suppletive synthetic comparative adjectives, for example, are extremely frequent in all types of texts. They are used almost exclusively for the comparative formations of these adjectives regardless of the type of text. Of the suppletive comparative adjectives, only the synthetic comparative of mal, pejor, shows a difference in usage based on text type. This difference parallels the difference shown by the adjectives discussed above. The other suppletive synthetic comparative adjectives, melhor, major, plusor, and menor, occur in the clear majority of the dozens or hundreds of comparative uses across the text types and, unlike the adjectives discussed above, have no clear trend of usage based on text type. These are by far the most frequent adjectives studied, and it is the frequency of their comparative use that is the key to the pattern of

85

usage. Because they are such highly frequent words, the synthetic forms are used and heard regularly, and it is less likely that the forms will become associated with one specific type of text or speech. The adjectives discussed above, on the other hand, such as aut, bel, and gen, are much less frequently used in the language, and this low frequency may make them more susceptible to changes in pattern and association. It is important to note that the lyric poetry, which comprises less than 10% of the extant texts, accounts for just over 20% of the comparative adjectives overall, so this relatively high frequency of usage compared to the rest of the texts would make it easy for the infrequent synthetic forms to become associated with the lyric poetry for two reasons: first, because with the higher frequency of comparative adjectives, they would be more likely to appear, and second, because the troubadours are known for their word play and exotic rhyme and meter and sometimes use obscure words to achieve this. In the case of the highly frequent synthetic comparative adjectives, however, this association has not taken place and the synthetic comparative adjectives are used with essentially the same frequency across text types.

Finally, there are three adjectives which fall into neither of these two categories.

They are not highly frequent adjectives with suppletive synthetic comparatives such as bon/melhor, nor do they seem to follow the pattern of text type use described above for adjectives such as aut. These adjectives, larc, lonc, and sorde, show no difference in number and percentage of uses across the text types. For sorde, this is because the analytic comparative plus sorde is never used in any text. Any time the comparative form of this adjective appears, the synthetic form is used. In the single case noted in Table 2.6

86

where the synthetic comparative is not used, the form used is the doubly marked comparative plus sordejor. It is important to note that while the percentage of usage of synthetic forms does not change across text types, the use of the comparative form of the adjective does change. Although the lyric texts comprise less than 10% of the Old

Occitan corpus, they account for 61% of the instances of sordejor. The comparative use of this adjective is very rare in prose texts, though no difference in the use of analytic and synthetic forms has developed. Instead, prose authors more often choose to use other comparative adjectives, such as pejor, to mean ‘poorer’ or ‘worse’, using sordejor only very rarely.

For the final two adjectives, larc and lonc, though the percentage of use of the synthetic comparative remains consistent across text types, there is a difference in when and how often these adjectives are used. In the prose texts, the attestations of the synthetic comparative forms of these adjectives occur clustered in a small number of texts and are absent in the rest of the prose texts. All fifteen tokens of the synthetic comparative of lonc, lonhor, occur in only four prose texts: Da un Archivio notarile di

Grassa, the Compendion de l’Abaco, and the Registre de Dienne and the Vidas. By contrast, the three attestations of lonhor in the lyric poetry occur in three separate texts.

Similarly, the eleven attestations of the synthetic comparative form of larc, largor, occur in only three texts. These three texts, Da un Archivio notarile di Grassa, Compendion de l’Abaco, and the Vidas, are three of the texts in which lonhor also appears. The use of the synthetic comparative of these adjectives among prose texts, then, is restricted to these few texts. In addition, the use of the synthetic comparative formations in the Vidas may

87

provide evidence that the association between the synthetic comparative forms and the lyric troubadour poetry exists for these adjectives also. The authors of the other three prose texts, however, choose to use the synthetic comparative forms very frequently, probably simply as an individual stylistic feature, as there seems to be no clear explanation linking the three texts and their usage of the synthetic comparative forms of larc and lonc. What is clear, however, is that instead of being spread throughout the prose texts as they are in the lyric poetry texts, the synthetic comparative forms of larc and lonc are tightly clustered together in a very small set of texts. While this pattern of usage is quite different from that discussed above for other non-suppletive comparative adjectives, the text type does seem to play a role.

2.6 Conclusion

As can be seen in this chapter, while the date and the dialect of the texts do not give any insight into the use of analytic and synthetic comparative forms for the subset of adjectives that occur with both formations, the type of text does influence the usage of these forms. In addition, considering each adjective individually gives a much more revealing picture than grouping all of the comparative adjectives together in a single quantitative analysis. For many adjectives, including aut, bel, fort, gen, and gros, the synthetic comparative adjective forms are strongly associated with the lyric poetry texts.

They occur much more frequently in the lyric poetry texts than in the non-lyric poetry or prose texts. While this trend is statistically significant for some adjectives, it is not significant for others. The correlation, however, between lyric poetry and the synthetic

88

comparative formations of these adjectives is very strong. The competition between the analytic and synthetic comparative forms described in grammars such as Anglade (1977) and Crescini (1926) describes the pattern of usage in the lyric poetry texts very well, as well as that of the non-lyric poetry texts to a lesser extent, but in the prose, this competition is almost non-existent. Aside from the grammars and vocabularies, the use of the synthetic comparative adjectives that are not suppletive is almost non-existent, and, where it does occur, it is clustered within a small number of texts rather than spread throughout the corpus. Instead, the analytic comparative adjective formation is used almost exclusively.

For the suppletive comparative adjectives, however, though there is some difference in the percentage of use of the synthetic comparative of mal, the high frequency of usage is relatively consistent throughout all text types. Even in the case of mal, the synthetic forms are commonly used throughout all types of texts; they are simply more common in the lyric poetry. For these suppletive comparative adjectives, then, the expected comparative adjective formation is the synthetic comparative formation, rather than the analytic adjective formation as it is for the vast majority of adjectives in Old

Occitan.

The differences in the distribution of analytic and synthetic adjective formations by text type found in this close analysis should be taken into account. Rather than simply listing the adjectives and their forms, it should be noted that while some synthetic comparatives are used throughout the Old Occitan texts precisely as described in the grammars, other synthetic comparative adjectives are highly associated with, though not

89

quite restricted to, the lyric troubadour poetry. While this is primarily a lexical difference, in that these particular lexical formations are associated with the lyric poetry, this difference affects the description of the grammar of Old Occitan. Some of the competition between analytic and synthetic comparative adjective formations is an artifact of the text type, rather than a development of the Old Occitan language.

90

Chapter Three: Adjective Derivation

3.0 Adjective Derivation

I turn now to the evidence pertaining to text type differences that can be seen in the derivation of adjectives. Unlike the comparative adjectives discussed in Chapter Two, text type is not the only factor that influences the pattern of adjective derivation in Old

Occitan; nonetheless, it is clear that the type of text does affect the creation and use of derived adjectives.

3.1 Adjective Derivation in Grammars and Descriptions of Old Occitan

In his Word-Formation in Provençal, Adams (1913) considers three major methods of word formation: word formation by adding a prefix, word formation by adding a suffix, and parasyntheta, the simultaneous addition of both a prefix and a suffix,1 in addition to other methods of word formation, such as the development of deverbal nouns2 and compounding. While all of these word-formation types could prove interesting, in this analysis I focus on a very small subset: suffixes used to form adjectives. These suffixes are all discussed in Part I of Adams’s list, Suffix-formation.

1 These are often referred to elsewhere as circumfixes, though Adams also includes in this category cases in which both the prefix and suffix added exist independently in the language, but in which no forms are found with only one of them. 2 Or, as Adams calls them, postverbal nouns (p. 536) 91

Suffixation is, according to Paden (1998), the most powerful, or productive, type of word formation in Old Occitan, and he relates this preference, along with morphological changes occurring at the end of verbs, nouns, and demonstratives in Old Occitan, to accent, which tends to fall at or near the end of words in Old Occitan (p. 317). This means that Old Occitan may be characterized typologically as a right-edge language.

Descriptions of the five adjective-deriving suffixes to be considered are given in the following section. The rest of the chapter compares the patterns of use of these five adjective forming suffixes in the COM, considering how they are used in both adjectives and in names, and how their patterns of usage can be partially explained by examining the date, dialect, and text type of the texts.

3.1.1 Derivational Suffixes to be Analyzed

There are many suffixes used to derive adjectives in Old Occitan. This chapter analyzes the patterns of usage of five of those suffixes: -al, -art, -enc, -esc, and -ivol.

Four of these suffixes are, at least descriptively, very similar. Except for -ivol, which I introduce last, Adams (1913) discusses them together with -an, -ar, -es, and -in as suffixes added to nouns, although they are often added to adjectives and verbs as well. He argues that no strict classification of their meanings can be made, as their meanings

“shade into each other”, and they all give the same kind of force or meaning to the stem they attach to: ‘pertaining to’ or ‘belonging to’ or, when added to adjectives, they give no additional meaning at all (p. 268). In the latter case, the derivational suffixes may be added in poetry for metrical reasons, but such additions also occur in prose texts. For

92

example, both purpra and purpurenc mean ‘purple’, and purpurenc appears only in prose texts. Many roots occur with more than one of these suffixes in the Old Occitan corpus, such as fogal and foguenc, both meaning ‘fiery’, composed of the root fog- with the derivational suffixes -al and -enc, respectively.

3.1.1.1 -al

Alphabetically, the first of the derivational suffixes is -al, derived from the Latin derivational suffix -alis (Adams 1913, Meyer-Lübke 1923). The use of -al to derive adjectives in Old Occitan is very broad and varied; it was one of the most widespread adjectival suffixes in Latin and continued as such in most of the Romance languages;

Adams describes it as “exceedingly common” (p. 287). In his Grammaire des Langues

Romanes, Meyer-Lübke describes Latin -alis as referring to similarity or relation “dans la plus large acception” (p. 523), listing dozens of words derived using this suffix in Latin as well as later derivations in the daughter languages. In addition to meaning ‘belonging to’ or ‘pertaining to’, it is used to refer to suitability, relationships, and in some official terminology, rank. Adjectives ending in -al, both those inherited from Latin and those created during the Old Occitan period, are often used substantively, and some of these have lost their adjectival force in Old Occitan and are best described as nouns in Old

Occitan texts (Paden 1998). None of the discussions of -al as a derivational suffix mention names created using this suffix, but such names do occur in Old Occitan; for example, Barral, Artal, and Tubal.

93

3.1.1.2 -art

While the use of -al in names may not be often mentioned, other derivational suffixes are recognized as widely and visibly used in names. Two of these adjective- deriving suffixes are worth special mention because they are borrowed from the

Germanic languages that were spoken by the and other Germanic tribes who invaded the area. In the fourth century, Celtic , like much of Europe, was affected by the Germanic invasions. The Visigoths, an East Germanic tribe, moved into southern

Gaul in the fourth century. Despite their large numbers, perhaps as large as 100,000, they seem to have been unable or unwilling to resist the drift toward Romanization

(Wallace-Hadrill 1985). Like the before them, they were quickly and almost thoroughly Romanized, except for some religious details: they were Arian Christians.

The Gothic language was apparently lost quickly; according to Wallace-Hadrill, “Latin was the primary language of the of the second and succeeding generations” (p. 26).

There is no mention of bilingualism, nor a continued use of the Gothic language in contemporary sources, as there was for Francien in Northern France (e.g. Hall 1974 and

James 1982). I have argued elsewhere (Wilson 2009) that the number and breadth of borrowings, particularly the borrowing of derivational morphemes, indicate that the contact between the dialects of Vulgar Latin that developed into Old Occitan and the

Germanic languages in southern France must have been deeper and more prolonged than is recorded. In addition, Gothic had some influence on the languages of the Iberian

Peninsula, so it is clear from this that the Gothic language must have survived until the

Visigoths migrated to modern-day Spain (Penny 2002). What is important here, however,

94

is that two of the derivational morphemes in Old Occitan which are of Germanic origin are adjective-forming suffixes: -art and -enc.

One of these morphemes, -art, developed out of a Germanic word rather than a suffix. All sources agree that it is derived from the Germanic word ard or hard ‘hard, strong’. It was introduced into Old Occitan almost exclusively by means of names.

According to Adams (1913), it was found at first only in proper names, such as Adelhart or Richart, and only later was detached and used as a regular suffix, serving to form adjectives from nouns. It is important to note that there are many name elements that were borrowed into Old Occitan in addition to -art and -enc (which is introduced in the following section), but none of the other elements were reanalyzed and used as productive suffixes to derive adjectives or other words. In the readings used in Paden’s

An Introduction to Old Occitan, over half of the names found are of Germanic origin.

Similarly, Chambers (1971), and Dauzat (1951) give Germanic etymologies to large numbers of names found in Old Occitan works.

After this reanalysis of -art as a suffix, the proper names from which the suffix was detached seem to have been regarded as words denoting the dominant characteristics of the persons whom they named (Adams 1913). That is, Richart ‘one with strong wealth’, from a Germanic compound of *reiks ‘wealth’ and hard ‘strong’, was reanalyzed as a root rich followed by a suffix, with the new suffixed form meaning ‘one characterized by wealth’. Although the interpretation of the structure of the name is different, the meaning did not change much. In addition, there is a layer of mixed names, where the suffix, or name element, is added to a Latin stem: for example, Bonart,

95

Brassart, and Noctart, in which the first parts of the names are bon ‘good’, bras ‘arm’, and noct3 ‘night’, respectively (Chambers 1971, Dauzat 1951). There are similar patterns in the kinds of stems used in these names in both borrowed Germanic names and mixed

Latin-Germanic names.

In Old Occitan, -art, like -al discussed above, is used as a suffix to indicate the possession of a quality, but it can also mean, more specifically, the strength of that quality; it is often translated ‘full of X’. Common adjectives formed with this suffix include auzart ‘daring’, testart ‘stubborn’, eissart ‘the same, one’s own’, moisart ‘false’, and galhart ‘cheerful’. The last of these is noteworthy because the stem, galha, is neither

Latin nor Germanic in origin. It is derived from Celtic *galia (Paden 1998). Galhart is one of the most common of the words in -art in Old Occitan. In some cases, the suffix

-art can have a “pejorative or argumentative” sense (Anglade 1977, p. 385).

3.1.1.3 -enc

The suffix -enc was also borrowed from Germanic, but seems to have developed from an actual suffix rather than the second element of a compound structure by reanalysis.4 There is some debate about the specific origin: Anglade (1977) argues that it derives from a Germanic suffix -enga, which was chiefly used as a patronymic, or a marker of tribe or origin. Philipon (1907), on the other hand, argues on the basis of the feminine form that it derives instead from a pre-Germanic suffix *-inko, and Paden

3 Or bratz and noch, as the second and third are more often cited in Old Occitan 4 There are both names and adjectives in Old Occitan in which the Germanic suffix is found with a Germanic root, so it is likely that such words and names ending with the suffix -enc were borrowed into Old Occitan whole and were later reanalyzed and the suffix extracted. 96

(1998) gives a derivation from Germanic -inc. In any case, the suffix was, like -art, introduced into Old Occitan primarily by way of names, such as Loarenc, Bradenc, and

Bertenca. As with names in -art, in addition to these borrowed Germanic names, there are attested mixed names ending in -enc based on a Latin root, such as Cretienc, Aurenca, and Arabienc (Chambers 1971, Dauzat 1951). In these mixed names, in which the first parts of the names are Crete, Aure (the name of a river), and Arab, the original meaning of tribe or origin of the Germanic suffix is fairly clear.

In Old Occitan, -enc is used to mark origin, resemblance, material, or color.

Common adjectives formed with this suffix include sebenc ‘illegitimate’ (literally, ‘one from under a bush’), flamenc ‘fiery’, albenc ‘white’, and majenc ‘something that happens in May’. There is one interesting case in which the suffix is added to the preposition or adverbial for ‘except, out of’ to give the adjective forenc ‘foreigner’ (Adams 1913).

The two adjective-forming suffixes of Germanic origin, -art and -enc, are of particular interest in the analysis that follows because it is possible that comparing the frequency of these two adjective-forming suffixes to those of Latin origin could reveal something about their level of integration into the grammar and lexicon of Old Occitan, in addition to the exploration of possible effects of the type of text and other parameters.

3.1.1.4 -esc

Another suffix analyzed in this chapter is -esc, which also shows some influence from Germanic. The suffix -esc is not included in the list of derivational suffixes in

Anglade’s (1977) grammar, but is included in Adams (1913) as well as Meyer-Lübke’s

97

(1923) Grammaire des langues romanes. Meyer-Lübke argues that the Old Occitan -esc is simply derived from the Latin -iscus, an old borrowing of Greek -ισκος. According to

Adams (1913), on the other hand, the Old Occitan use of the suffix -esc comes from two sources, the Latin suffix -iscus and the Germanic suffix -isk. The form of -esc comes directly from the Latin -iscus, though there may be some Germanic influence on the vowel quality, while the meaning is strongly influenced by that of the -isk of Germanic origin (Adams 1913). The Latin suffix -iscus is originally a diminutive, while the

Germanic -isk is a patronymic. In Old Occitan, adjectives formed with -esc are neither diminutives nor patronymics. Instead, some adjectives with -esc have a basic ‘pertaining to’ meaning, while many others have a meaning of nationality or origin (p. 186). It is in the latter meaning that the influence of the Germanic suffix -isk can be most clearly seen

(Adams 1913, p. 310), as the link between patronymic and nationality is fairly clear, while there seems to be no clear semantic link between diminutive and nationality. It may be useful to consider the Old Occitan suffix -es as an alternate source of this meaning of nationality or origin. The suffix -es, descended from Latin -ensis, has precisely this meaning of nationality and its similarity in form would make analogy between it and -esc plausible. In any case, in Old Occitan, many adjectives formed with -esc refer to nationality or to groups of people, such as espanesc ‘Spanish’, sarrazinesc ‘Saracen’, and even proensalesc5 ‘Provençal’, while others, such as folesc ‘foolish’, have a more general meaning.

5 Note that the suffix -al has already been added to the root; the addition of -esc thus adds little meaning. 98

3.1.1.5 -ivol

The final suffix analyzed in this chapter, -ivol, is somewhat different. Used mostly to form adjectives from verbs, the suffix -ivol is described by Anglade (1977) as fairly rare and appearing principally in texts of the Waldensian dialect. Unlike some of the other suffixes discussed in this section, -ivol has a clear and consistent meaning of possibility, both active and passive (Adams 1913). The Old Occitan suffix -ivol is derived from Latin -ibilis6 under the influence of the Italian suffix -evole, derived from the same

Latin suffix (Adams 1913, Anglade 1977). If the Old Occitan suffix -ivol is indeed a feature of Waldensian dialect, this influence is quite reasonable given the geographic proximity of the Waldensian dialect area to speakers of Italian and other Eastern

Romance dialects. In addition to the occurrence almost solely in the Waldensian dialect and the Italian influence, Adams (1913) argues that adjectives formed with -ivol are learned formations made by using Latin -ibilis fairly directly. Thus, the form of the suffix is -ivol rather than -evol, which we would expect to be the reflex of -ibilis in Old Occitan.

It is worth noting, however, that other learned formations of Latin -ibilis also occur in

Old Occitan with the form -ible – for example, franhible ‘breakable, fragile’ – modeled on Latin words such as audibilis (Adams 1913). One of the purposes of the analysis that follows is to consider the claim that this suffix is dialectally constrained. Although the suffix -ivol is different from the other suffixes under consideration in some important ways, it is included in this analysis in order to confirm its dialectal basis and to compare the pattern found with -ivol to the patterns found with the other suffixes.

6 Old Occitan also has a suffix -able used for deriving adjectives that is derived from Latin -abilis which is added to verb roots to create adjectives of possibility. 99

3.2 Name Creation in Old Occitan

The suffixes introduced in the previous section are used in the formation of names as well as the derivation of adjectives. Naming practices in the Old Occitan area and time period intersect with word formation, particularly the formation of adjectives, in interesting ways. First, many of the names found in Old Occitan texts represent borrowings from the Germanic languages with which Late Latin, or very early Old

Occitan, came into contact. Some of these names, like , are borrowed whole, while in other names, only part of the name is borrowed; the rest of the name is built on parts descended from Latin.

Prior to this influx of Germanic people and Germanic names, the Gallo-Romans used single names, unlike the more general Roman tradition of three names. Often when a Gallo-Roman child was named, the child’s name was formed from part of the mother’s name and part of the father’s name. The division between these parts was often, though not always, on a morpheme boundary, resulting in names not unlike compound words.

Many Gallo-Roman names were opaque in meaning, however, because the division between the parts of the parents’ names was not always on a morpheme boundary.

Family identity was established by continued “leading names,” morphemes or parts of names used repeatedly in successive generations, not by the use of a patronymic or multiple names (James 1982).

The naming customs of the and Visigoths, on the other hand, tended toward final-theme variation and outright repetition of names, rather than repetition of parts or the alliteration of other Germanic peoples (Woolf 1936). This practice produced

100

names similar in character to the Gallo-Roman names, often compounds or seeming compounds. It may be this similarity of structure, if not quite of creation, that invited the

Gallo-Romans to begin borrowing Germanic names in such high numbers.

Bergh (1941) analyzes the names in an estate-survey taken by the bishop of

Marseilles in 814. In this survey, Bergh gives 204 names of Germanic origin that are attested before 1200 and their etymologies. There are more attested Germanic names in this survey than names of Latin origin, although because some of the names of Latin origin are more common, there are more people with Latinate names than those with names of Germanic origin.

More relevant to the discussion of word formation than the actual borrowing of these names, and parts of names, by Old Occitan speakers is the borrowing of a small number of Germanic morphemes as derivational affixes in Old Occitan. Of these, -art and -enc, which were introduced in the previous section, are arguably the most important.

Finally, the development of names in Old Occitan is relevant to the discussion of word formation because in many cases, the boundary between adjectives and names is not at all clear. This is especially notable in the case of adjectives used to denote belonging to a group of people, such as Israelitienc, Lombardesc, and Cretienc, which are sometimes used as simple adjectives, sometimes as very specific adjectives, and sometimes used as address terms in place of an individual’s name. In addition, there are many names which look very much like adjectives derived with the suffixes described in the following section, but the stem to which the derivational suffix is added is not a

101

meaningful root, or root with other affixes. Instead, it is simply a string of sounds taken from the names of other people in the family.

In the analysis that follows, names are considered both alongside adjectives created using the same suffixes and separately from those adjectives, in order to discover whether the patterns of name creation with these suffixes and use differ from those of adjective derivation.

3.3 Derived Adjectives in the Concordance de l’Occitan Médiéval

3.3.1 Methodology

In order to explore the patterns of usage of these derived adjectives, I searched the

COM exhaustively for these adjective suffixes. For the lyric and non-lyric poetry texts, the “suffix” search function of the COM interface was used. This search function, which is simply a string search that begins at the end of each word form, was used to search for the variants of each suffix, including both spelling variants and case endings. This search produced long lists of words that might have been derived with each suffix.

These lists were then analyzed and all word forms that were not made up of a stem plus the suffix in question were removed from the lists by hand. The stem could consist of a simple root, as in the case of estimivol ‘calculable’, which can be broken down into the verb root estim- of estimar ‘to calculate’ and the suffix -ivol. In other cases, the stem consists of a root plus other affixes, as in the case of proensalesc ‘Provençal’, mentioned in Section 3.1.1.5. What is important is that words that end in the searched-for

102

string of sounds but that do not include the derivational morpheme in question, such as cal, an interrogative pronoun, have been excluded from the analysis that follows. This is particularly important in the case of the suffix -enc. There are many words in Old Occitan that end in the string 7 but are not adjectives derived by means of the morpheme

-enc. For example, the third-person singular of the common verbs tener ‘to hold’, prendre ‘to take’, venir ‘to come’, and devenir ‘to become’ end in this string of sounds: tenc, prenc, venc, and devenc, respectively. In addition, these verbs frequently occur with prefixes, such as endevenir ‘to happen’. If only the strings of sounds are considered, the high frequency of these verb forms would skew the results and the resulting analysis would not be accurately considering or describing the pattern of use of derivational morphemes, but instead the pattern of use of certain strings of sounds.

Similarly, the single most common word in Old Occitan ending in the string is nivol ‘cloud’, which is a monomorphemic word rather than a root plus the suffix -ivol. In some groups of texts, nivol occurs more often than all other forms ending in put together. In order to remove from consideration words that did not seem to be composed of the derivational morpheme in question added to a stem, I relied heavily on Levy’s

Provenzalisches Supplement Wörterbuch, both for recognizing roots and for putting related words together to identify the derivational suffixes used.

Once a list of the word forms that included the derivational suffixes under consideration was obtained, a list for each suffix of the lemmas and all of their variants was developed. For each lemma, or different combination of stem plus suffix, all possible

7 Or or or other spelling variants. 103

variants were listed, including both spelling variants and case forms. An example of such a list of variants is given in (1), using the derived adjective auzart ‘daring’.

(1) ausart ausarda auzarda ausard auzards auzart ausartz auzartz

The case forms have been combined for several reasons. First, case forms were combined for consistency with my established pattern, as they were combined in the discussion of comparative adjectives in Chapter Two. More importantly, the case forms are sometimes used interchangeably, where one formal case form is used in a place where, syntactically, we would expect another. In addition, many of these derived adjectives follow the basic adjectival declension of Old Occitan, in which the masculine and feminine declensions used a morpheme that is formally the same, -s, in different places within the paradigm, while others have distinct masculine and feminine forms.

Keeping the case forms distinct would produce a larger number of categories to consider, but those categories would not have clear boundaries. It seemed most logical to avoid the issue of case forms entirely and consider the derived adjectives only in terms of lemmas and tokens.

In addition, some adjectives have developed a nominal meaning. This occurred because these adjectives were often used substantively and eventually came to be 104

considered and used as nouns in their own right (Adams 1913, p. 139). For example, pilhart can be used as an adjective meaning ‘thieving’ but can also be used substantively to mean ‘a thief’ or, by extension, ‘a servant’. Some of these examples appear with nominal case endings instead of adjectival case endings. Although the case endings for the nouns and adjectives are very similar, they are not identical in all instances. They are identical, in terms of paradigmatic cells, frequently enough that it would not be possible to separate out all instances in which the derived adjective might be better analyzed as a noun. Instead, all the attestations are considered together, regardless of whether the derived adjective is syntactically best considered an adjective or a noun.

In order to lemmatize the word forms with the derivational suffixes, the complete list of word forms for each suffix in each text type was analyzed. The text type lists were used because they are the most comprehensive, and those three lists include all of the word forms attested in Old Occitan. Unlike the comparative adjectives discussed in

Chapter Two, the derived adjectives were too numerous to analyze accurately by hand, so a computer program was used to find all variants of each lemma, and the frequency of each variant, within each group of texts to be analyzed: each time period, each dialect, and each text type. The program then created a list of the lemmas that are found in each group of texts and calculated the number of lemmas and the number of total attestations of each lemma for the group of texts. The computer program did not identify variants of each lemma, but used the hand-prepared list of variants to find the lemma and token frequency.

105

A separate list of lemmas and attested variants of each lemma was developed for common adjectives and for names because of the association of the suffixes -art and -enc with names both before and during the Old Occitan period. For example, the list of the variants of the adjective auzart is given in (1) above and the list of variants of the name

Rabastenc is given in (2).

(2) @Rabastenc @rabastencs @rabastenc @Rabastencha @Rabastencs @Rabastengz @Rabastencx

A combined list of all attestations, both simple adjectives and names, was also developed.

For example, israelitenc ‘of or relating to Israel; Israelite’ is used both as an adjective and as a name; all variants of the lemma israelitenc, both names and adjectives, are given in

(3).

(3) @Israelitenc israelitienca israelitienc israelitiencs ysraelitienc ysraelitiencx @Israelitienc @Israellitienc

106

Because many of the variants within the list of names are formally identical to those in the list of adjectives for the capital letter or @, the total number of lemmas using a particular affix within a group of texts is smaller than the sum of the number of adjective and name lemmas within that group of texts. The number of total attested tokens using a particular affix, on the other hand, is simply the sum of the tokens of the adjectives and names within that group of texts.

3.3.2 Results

In the full COM corpus, 1,338 lemmas of adjectives derived using -al, -art, -enc,

-esc, and -ivol were found. These lemmas occurred in 35,889 total attestations. Of the lemmas analyzed, 597 were derived with -al, 278 with -art, 280 with -enc, 110 with -esc, and 91 with -ivol. The token frequency presents a somewhat different pattern, with

26,631 attestations of adjectives derived using -al, 6,667 with -art, 1,328 with -enc, 417 with -esc, and 846 with -ivol. This breakdown is shown in Table 3.1, with the number of lemmas and tokens found shown in bold and the percentage of the total number represented by each suffix below it. Figure 3.1 also shows the breakdown of lemmas and tokens with each suffix in the entire COM corpus.

107

Table 3.1 Derived Adjectives in the Entire COM Corpus

-al -art -enc -esc -ivol Total Lemmas 579 278 280 110 91 1,338 43.3% 20.8% 20.9% 8.2% 6.8% Tokens 26,631 6,667 1,328 417 846 35,889 74.2% 18.6% 3.7% 1.2% 2.4%

Total Lemmas Total Tokens 1% 4% 2% 7% 8% al al art 19% art 43% 21% enc enc esc esc ivol 74% ivol 21%

Figure 3.1 Percentages of Lemmas and Tokens with Each Suffix in the Entire COM Corpus

The pattern of occurrence of derived adjectives across the COM corpus tells us several things. First, -al is by far the most frequently used derivational suffix of the five suffixes considered. It is particularly notable that many of the individual adjectives derived in -al have a very high token frequency, so that the percentage represented by adjectives in -al, while already very high when considering the lemmas, is even higher when considering the token frequency of these suffixes. Adjectives derived using -esc, on the other hand, have the lowest token frequency, as they account for 8.22% of the lemmas of derived

108

adjectives under consideration, but only 1.16 % of the tokens. Similarly, adjectives derived in -enc and -ivol also have lower token frequencies than -art, and much lower token frequencies than -al.

Collapsed together as in Table 3.1 and Figure 3.1, however, the derived adjectives used in the COM do not give a complete picture of the patterns of usage of these five derivational suffixes. Separating the regular adjectives derived by means of each suffix from the names derived by means of each suffix gives a better idea of the patterns of usage involved, particularly with -art and -enc, the Germanic suffixes whose pattern of borrowing involved names. Sections 3.3.2.1 and 3.3.2.2 discuss the patterns found when considering only adjectives and only names, respectively.

3.3.2.1 Adjectives

In all, 879 adjective lemmas were found and analyzed, attested in 26,980 tokens.

Of the adjective lemmas analyzed, 472 were derived with -al, 117 with -art, 120 with

-enc, 79 with -esc, and 91 with -ivol. The token frequency again presents a somewhat different pattern, with 23,983 attestations of adjectives derived using -al, 1,212 with -art,

601 with -enc, 338 with -esc, and 846 with -ivol. This breakdown is shown in Table 3.2, with the number of types and tokens found shown in bold and the percentage of the total number represented by each suffix below it. Figure 3.2 shows the breakdown of lemmas and tokens with each suffix in the entire COM corpus.

109

Table 3.2 Adjectives in the Entire COM Corpus

-al -art -enc -esc -ivol Total Lemmas 472 117 120 79 91 879 53.7% 13.3% 13.7% 9.0% 10.4% Tokens 23,983 1,212 601 338 846 26,980 88.9% 4.5% 2.2% 1.3% 3.1%

Adjective Lemmas Adjective Tokens

2%1% 3% 5% 10% al al 9% art art

14% enc enc 54% esc esc

13% ivol 89% ivol

Figure 3.2 Percentages of Adjective Lemmas and Tokens with Each Suffix in the Entire COM Corpus

Although the trend is in many ways the same, the disparity between lemma and token frequencies is even stronger when considering only the adjectives than when considering all derived forms above. The suffix -al, while accounting for over half the lemmas analyzed, accounts for almost 90% of the tokens. This is again because some of the adjectives created with the derivational suffix -al have a very high token frequency. The

110

other four suffixes, on the other hand, have a much lower token frequency than lemma frequency.

3.3.2.2 Names

When we turn to the names created with these suffixes, the pattern is quite different. Despite the similarity of the names and adjectives created with these suffixes and the porous boundary between them, the processes of deriving names and adjectives are different and this is reflected in the different patterns found for adjectives and names.

572 different names were analyzed: 167 with -al, 189 with -art, 183 with -enc, and 33 with -esc. An important difference in the patterns of adjectives and names is in the use of

-ivol. While the suffix is used to derive 91 adjectives which are attested 846 times, it is never used in the creation of names. In the entire COM corpus, there is not a single instance of a name derived by means of -ivol. This is almost certainly due to the differences between -ivol and the other four suffixes in meaning, combinatorics, and probable date of development or innovation. While the other suffixes can be added to nouns, adjectives, and, in some cases, verbs, -ivol is added only to verb roots, which are not often used in name creation. In addition, while the other four suffixes have less concrete meanings, adding something like ‘related to’, the possession of a quality, or, when added to adjectives, adding nothing at all to the meaning of the stem, -ivol has a very definite meaning of ‘possibility’. In addition, because there is no history of such a use in the etymology of the suffix, as there is with -art, -enc, and -esc, the use of -ivol in

111

names would be an innovation during this time period. The restrictions on combinatorics and meaning, however, make it unlikely for -ivol to be introduced into the name-creation.

The token frequency presents a somewhat different pattern from that of the lemma frequency, with 2,655 attestations of adjectives derived using -al, 5,443 with -art, 727 with -enc, and 79 with -esc. This breakdown is shown in Table 3.3 and Figure 3.3.

Table 3.3 Names in the Entire COM Corpus

-al -art -enc -esc Total Lemmas 167 189 183 33 572 29.2% 33.0% 32.0% 5.8% Tokens 2,655 5,443 727 79 8,904 29.8% 61.1% 8.2% 0.9%

Name Lemmas Name Tokens

6% 1% 8%

29% al 30% al 32% art art enc enc esc esc 61% 33%

Figure 3.3 Percentages of Name Lemmas and Tokens with Each Suffix in the Entire COM Corpus

112

It is important to note that, unlike in the case of the adjectives, and of the derived forms as a whole, -al does not account for the overwhelming majority of tokens. Instead, names in -art account for slightly more lemmas and twice as many tokens as those derived by

-al. Thus, considering the names separately from regular adjectives is an important part of analyzing the patterns of use of these derivational suffixes. Considering only the derived forms as a whole without separating the names from the adjectives fails to bring this difference to light.

It is also worth considering the proportion of adjectives and names derived with each of the suffixes. Because there is some overlap between the two (for example, Auzart

‘daring’ is sometimes used as a name or address term and more often as an adjective), adding them and considering the percentage of total lemmas and tokens represented by adjectives and names is difficult to do quantitatively. It is important to note, however, that in addition to the striking and crucial difference of -ivol discussed above, -al and

-esc are used more frequently to derive adjectives, in terms of both lemmas and tokens, than they are used to derive names. The opposite is true for -art and -enc, however, as these two suffixes are used more frequently, in terms of both lemmas and tokens, in name-derivation than they are in the derivation of simple adjectives. These different patterns are particularly striking in -al and -art, because of the high token frequency of the forms derived with each suffix. In the case of -art, it is the names derived with -art that outnumber both the names derived by the other suffixes and the adjectives derived with -art; for -al, it is the high token frequency of the adjectives derived with the suffix

113

that overwhelms both the adjectives derived with other suffixes and the names derived with -al.

3.3.2.3 -al Revisited

The tokens of -al are so overwhelming, in fact, that I exclude them from the analyses that follow. The lemmas and tokens of adjectives derived with -al are so numerous that they distract from the pattern and discussion of the other suffixes. More importantly, none of the parameters discussed in the following sections make any difference at all in the pattern of use of the suffix -al. Regardless of which parameter is considered and which group of texts is examined, adjectives derived in -al account for the majority of the lemmas analyzed and for the vast majority of the tokens. While there are some cases where the percentage of lemmas represented by adjectives in -al is below

60% or the percentage of tokens is below 80%, these cases are best explained in terms of the patterns of the other adjectives. That is, the apparent drop in the use of -al is simply a surge in the use of another suffix, and is better explained by the use of that affix rather than by the use of -al itself. In addition, in some of the smaller groups of texts, such as the eleventh century texts or those of the Limousin dialect, the total numbers of attestations within the groups are so small that even very frequent words occur only a small number of times and the total number within the time period or dialect can be skewed by what is essentially a sampling issue.

The distraction of -al is much less noteworthy when considering only names, because -al is much less frequent, in terms of both lemmas and tokens, in names than in

114

adjectives. The suffix is excluded from the analysis of names as well, partly for consistency and partly because, like the use of -al in adjectives, none of the parameters under consideration seem to affect the pattern of use of the suffix -al. It is used consistently across all time periods, dialects, and text types.

The consistency of the use of the suffix -al, in contrast to the use of the other suffixes, may be simply because the suffix is so very frequent. Like the high frequency synthetic comparative formations discussed in Chapter Two, this suffix is heard and used frequently in a range of uses and contexts, and is less likely to become associated with one specific type of text or dialect. Derivational suffixes with lower frequency, on the other hand, may be more susceptible to changes in pattern or association. In addition, the suffix -al is inherited directly from a very common Latin derivational suffix rather than an Old Occitan innovation in form or meaning. Perhaps -al is less susceptible to association with any dialect or other parameter because it so entrenched in the grammar of the language. The other suffixes discussed in this chapter, however, are both less frequent and represent more recent innovations in form, meaning, or both. Because these suffixes or patterns are more recent innovations, they may not have spread to all dialects and usages. In any case, the suffix -al is excluded from the following discussions both because its pattern of usage seems to be universal and unaffected by the parameters of date, dialect, and text type, and because its extremely high frequency distracts from the effects these parameters have on the other suffixes. In the rest of the chapter, I focus on the remaining four derivational suffixes, -art, -enc, -esc, and –ivol, in terms of those three parameters – dialect, date, and text type – in Sections 3.4, 3.5, and 3.6 respectively.

115

3.4 Analysis of Derived Adjectives by Dialect of Text

In this section, I consider the use of derived adjectives by the dialect of the text.

As in the analysis of comparative adjectives by dialect in Chapter Two, only non-lyric poetry and prose texts with moderately secure locations are included in this analysis.

These texts were divided into seven dialects, as described in Section 1.7.2. In each group of texts, all lemmas and tokens of each of the five suffixes were analyzed, though -al is not included in the discussion that follows, as explained above.

I begin with the analysis of dialect because of -ivol. This suffix is described as being found only in the Waldensian dialect (Adams 1913, Anglade 1977), so part of the purpose of this analysis is to test this claim, looking at all texts in all dialects of Old

Occitan to see whether the use of this suffix is accurately described as dialectally based.

First, all attestations of derivations involving each suffix in each dialect, including both adjectives and names, are considered. The breakdown of lemmas derived by each suffix is shown in Table 3.4, with the number of lemmas found shown in bold and the percentage of the total number represented by each suffix below it.

116

Table 3.4 Total Derived Lemmas by Dialect of Text

-art -enc -esc -ivol Total Alpine Provençal 19 4 0 0 23 82.6% 17.4% 0% 0% Auvernhat 6 3 0 0 9 66.7% 33.3% 0% 0% Gascon 74 28 27 0 129 57.4% 21.7% 20.9% 0% Languedoc 81 31 6 0 118 68.6% 26.3% 5.1% 0% Limousin 3 1 2 0 6 50.0% 16.7% 33.3% 0% Provençal 63 14 9 0 86 73.3% 16.3% 10.5% 0% Waldensian 7 62 0 76 145 4.8% 42.8% 0% 52.4%

It is immediately obvious that the patterns of usage are different in each of the dialects.

None of the four derivational suffixes shown in Table 3.4 are used in consistent proportions across all seven dialects. In fact, the overall difference in the pattern of lemmas using each suffix is statistically significant (p < 0.0001)8. First, the analysis shown in Table 3.4 confirms the claim that the use of -ivol is circumscribed by dialect, although it is important to note that there are more attestations of the -ivol suffix in the

COM as a whole than those found in the group of texts which can be securely located in the Waldensian dialect area; the texts which include the few other attestations cannot be located securely. The suffix -ivol is used to derive an astonishing 76 lemmas in the

Waldensian dialect, and is not found in texts that can be securely located in any dialect region other than Waldensian. The difference in the pattern of the use of -ivol, and its

8 This and other p-values in this chapter, unless otherwise noted, were found using a Fisher’s exact test with the statistical package R. 117

correlation with the Waldensian dialect, is highly statistically significant (p < 0.0001).

The pattern of use of -ivol clearly shows that the suffix is indeed dialectally constrained, but is highly productive within the Waldensian dialect.

In addition, Waldensian texts show a preference for words derived with the suffix

-enc, and this suffix occurs significantly more frequently in this dialect than in the others

(p < 0.001). The suffix -art, on the other hand, is used to derive a much smaller percentage of words in the Waldensian dialect than it is in other dialects. The Gascon dialect uses the suffix -esc in a larger percentage of words than the other dialects, except for Limousin, where the extremely small number of total lemmas – only six – keeps us from being able to draw any conclusions. The Alpine Provençal, Languedoc, and

Provençal dialects show patterns of usage somewhat similar to one another, favoring the use of -art, followed by -enc, with very few, if any, lemmas derived using -esc and none at all with -ivol. The Auvernhat and Limousin dialects seem to show the same pattern, although it is hard to tell with the small number of tokens.

An analysis of the tokens of words derived with each suffix gives essentially the same pattern. A breakdown of the tokens using each suffix that are attested in each dialect is given in Table 3.5.

118

Table 3.5 Total Derived Tokens by Dialect of Text

-art -enc -esc -ivol Total Alpine Provençal 58 10 0 0 68 75.3% 14.7% 0% 0% Auvernhat 14 6 0 0 20 70.0% 30.0% 0% 0% Gascon 674 100 85 0 859 78.5% 11.6% 9.9% 0% Languedoc 528 93 15 0 636 83.0% 14.6% 2.4% 0% Limousin 3 1 2 0 6 50.0% 16.7% 33.3% 0% Provençal 355 29 18 0 402 88.3% 7.2% 4.5% 0% Waldensian 92 212 0 572 876 10.5% 24.2% 0% 65.3%

As in the case of the lemmas discussed above, the overall difference between the dialects is statistically significant (p < 0.0001) The immediately obvious difference we see in the pattern of tokens with each suffix is again in the Waldensian dialect and concerns words derived with -ivol. This difference, as in data concerning lemmas discussed above, is highly statistically significant (p < 0.0001). The quantitative data here very clearly confirm the dialectal claims made about -ivol in the literature. Aside from this difference, it is important to note that the percentage of tokens found with -art is much higher than the percentage of lemmas, showing that some of the individual lemmas derived with -art have a very high token frequency. Figure 3.4 shows the percentages of lemmas and tokens with each suffix that are found in each dialect.

119

Lemmas by Dialect 100% 90% 80% 70% 60% 50% art 40% 30% enc 20% esc 10% 0% ivol

Tokens by Dialect 100% 90% 80% 70% 60% 50% art 40% 30% enc 20% esc 10% 0% ivol

Figure 3.4 Percentages of Lemmas and Tokens with Each Suffix by Dialect of Text

Figure 3.4 makes the different pattern of the Waldensian dialect very clear. The purple bar representing the -ivol suffix appears only in this dialect, and accounts for over half of the lemmas and tokens found in the Waldensian dialect. For all of the dialects except for

Waldensian, the most frequently used suffix among the four suffixes shown in Figure 3.4 120

is -art, especially when considering token frequency. The Waldensian dialect, on the other hand, has a much smaller proportion of forms derived with -art than any of the other dialects. This difference in the use of -art between the Waldensian dialect and the other dialects is again statistically significant (p < 0.0001).

This difference may be unsurprising, because the Waldensian dialect is geographically further from direct contact with Germanic-speaking people than the other dialects, but it is important to note that while the Waldensian dialect uses far fewer forms derived with -art than the other dialects, this is not the case with words derived with -enc, though both suffixes are of Germanic origin.

Another geographically based difference we might expect to find is that the suffixes of Germanic origin, -art and -enc, might be expected to be more frequently used in the dialect areas closest to other Germanic speakers: the Auvernhat and Limousin dialects. This is not the case, however; if we include the forms derived with -al, to get a better comparison, the Auvernhat and Limousin dialects use the largest proportion of forms with -al and some of the smallest proportions of forms with -art and -enc among the dialects. The distribution is likely caused by the small number of words and texts that can be located in each dialect area, but clearly does not reflect a higher usage of suffixes of Germanic origin in the geographical areas closest to speakers of Germanic languages.

In addition, Figure 3.4 shows that the percentage of lemmas found with -enc and

-esc is higher than the percentage of tokens of words with these suffixes, showing that words derived with -enc and -esc tend to have a low token frequency. The opposite, however, is true of the suffix -art; in all of the dialects except Waldensian, the percentage

121

of tokens found with -art is considerably higher than the percentage of lemmas, highlighting the high token frequency of some of the individual lemmas derived with

-art.

Many of the words derived with -art that have such high token frequencies turn out to be names. When we separate out the adjectives and the names derived with each suffix, the different patterns found in each dialect become even clearer. Table 3.6 shows the number and percentage of adjective lemmas and tokens in each dialect; the percentage of each is also shown in Figure 3.5.

Table 3.6 Adjective Lemmas and Tokens by Dialect of Text

Lemmas Tokens -art -enc -esc -ivol Total -art -enc -esc -ivol Total Alpine 6 0 0 0 6 20 0 0 0 20 Provençal 100% 0% 0% 0% 100% 0% 0% 0% Auvernhat 2 0 0 0 2 2 0 0 0 2 100% 0% 0% 0% 100% 0% 0% 0% Gascon 27 10 25 0 62 132 32 73 0 237 43.5% 16.1% 40.3% 0% 55.7% 13.5% 30.8% 0% Languedoc 23 16 5 0 44 116 56 14 0 186 52.2% 36.3% 11.3% 0% 62.3% 30.1% 7.5% 0% Limousin 2 1 0 0 3 2 1 0 0 3 66.6% 33.3% 0% 0% 66.6% 33.3% 0% 0% Provençal 16 6 8 0 30 38 18 17 0 73 53.3% 20.0% 26.6% 0% 52.0% 24.6% 23.2% 0% Waldensian 6 42 0 76 124 17 141 0 572 730 4.8% 33.9% 0% 61.3% 2.3% 19.3% 0% 78.4%

122

Adjective Lemmas by Dialect 100% 90% 80% 70% 60% 50% art 40% 30% enc 20% esc 10% 0% ivol

Adjective Tokens by Dialect 100% 90% 80% 70% 60% 50% art 40% 30% enc 20% esc 10% 0% ivol

Figure 3.5 Percentages of Adjective Lemmas and Tokens with Each Suffix by Dialect of Text

When we consider only the adjectives, the patterns of usage based on dialect discussed above are still visible. The Waldensian dialect continues to stand out because of its use of the suffix -ivol, while the patterns of usage found in the Languedoc and Provençal dialects are very similar to one another. Also, as seen above, the Gascon dialect uses 123

more words in -esc and fewer in -enc than the central dialects, though the Provençal dialect also uses words derived with -esc somewhat more frequently than the Languedoc dialect. Interestingly, there seems to be little difference between the percentage of lemmas represented by each suffix and the percentage of tokens represented by each suffix.

These patterns based on the dialect of the text are further reinforced by considering cases in which the same root appears within the COM corpus with more than one of the affixes under discussion here. For example, both sarrazinal and sarrazinesc

‘Saracen’ are found in the corpus, as well as fogal and foguenc ‘fiery’. Table 3.7 shows where the derivations with each suffix occur in each group of texts, considering only derivations of roots which are found in the corpus with more than one suffix.9

9 Although some of these words reflect alternations between the suffix -al and one more of the other suffixes, such as sarrazinal and sarrazinesc ‘Saracen’, Table 3.7 does not include the suffix -al for consistency with other tables in this section. As discussed in Section 3.3.2.3 above, however, derivations with -al are very frequent in all dialects, accounting for well over half of the tokens of roots derived with more than one suffix in all seven dialects. 124

Table 3.7 Tokens of Roots Which are Derived with Multiple Suffixes Found with Each Suffix in Each Dialect

-art -enc -esc -ivol Total Alpine 7 0 0 0 6 Provençal 87.5% 0% 0% 0% 0 0 0 0 0 Auvernhat 0% 0% 0% 0% 51 5 13 0 69 Gascon 73.9% 7.2% 18.8% 0% 58 5 3 0 66 Languedoc 87.9% 7.6% 4.5% 0% 0 1 0 0 1 Limousin 0% 100.0% 0% 0% 24 4 8 0 36 Provençal 66.7% 11.1% 22.2% 0% 0 58 0 30 88 Waldensian 0% 65.9% 0% 34.1%

In the Waldensian dialect, which had a strikingly low number of adjectives derived with - art, the suffix -art is avoided if there is an option. That is, if there is an alternation between a derivation with -art and a derivation with some other suffix within the corpus, the Waldensian texts consistently use the other suffix, whichever suffix it might be. For example, both bestial and bestiart ‘bestial’ are found in the Old Occitan corpus, but the texts from the Waldensian dialect use only bestial; both bestial and bestiart are found in the Gascon, Languedoc, and Provençal dialect areas. Similarly, though both noitart and noitenc ‘relating to the night’ are found in the Old Occitan corpus, texts from the

Waldensian dialect use only noitenc.

Though the pattern of derivation of these adjectives does not seem as similar in the Languedoc and Provençal dialects as we might expect from the discussion above, this is caused by a single text, the eleventh century Provençal Chanson de Sainte Foy, which 125

is an unusual text in many ways. The author of this text uses the form homenesc ‘’, which alternates with the -al derivation, homenal, elsewhere in the corpus. This usage seems to simply be a matter of the individual style and lexicon of the author of the

Chanson de Sainte Foy, and if this text is removed from consideration, the remaining patterns of adjective derivation in the Languedoc and Provençal dialects are, as expected, very similar to one another, with a large number of derivations in -art and a smaller number in -enc and -esc.

Turning then to the names derived by each suffix, the pattern is somewhat different. As noted in Section 3.3.2 above, there are no names in -ivol attested in Old

Occitan, so we can only consider names in -art, -enc, and -esc. Table 3.8 includes the number and percentage of lemmas and tokens of names found in each dialect for these three affixes; the percentage of each is also shown in Figure 3.6.

Table 3.8 Name Lemmas and Tokens by Dialect of Text

Lemmas Tokens -art -enc -esc Total -art -enc -esc Total Alpine 13 3 0 16 38 10 0 48 Provençal 81.3% 18.8% 0% 79.8% 20.8% 0% Auvernhat 5 3 0 8 12 6 0 18 62.5% 37.5% 0% 66.7% 33.3% 0% Gascon 50 19 2 71 542 68 12 622 70.4% 26.8% 2.8% 87.1% 10.9% 1.9% Languedoc 59 16 1 76 412 37 1 450 77.6% 21.1% 1.3% 91.6% 8.2% 0.2% Limousin 1 0 2 3 1 0 2 3 33.3% 0% 66.7% 33.3% 0% 66.7% Provençal 52 8 1 61 317 11 1 329 85.3% 13.1% 1.6% 96.4% 3.3% 0.3% Waldensian 1 27 0 28 75 71 0 146 3.6% 96.4% 0% 51.4% 48.6% 0%

126

Name Lemmas by Dialect 100% 90% 80% 70% 60% 50% 40% art 30% 20% enc 10% esc 0%

Name Tokens by Dialect 100% 90% 80% 70% 60% 50% 40% art 30% 20% enc 10% esc 0%

Figure 3.6 Percentages of Name Lemmas and Tokens with Each Suffix by Dialect of Text

Even without -ivol, with its usage clearly based on dialect, there are important differences in the usage of each suffix in names based on dialect. The Waldensian dialect continues to stand out, having a much larger percentage of name lemmas and tokens in -enc and a

127

much smaller percentage of name lemmas with -art than the other dialects. All 75 tokens of names in -art in the Waldensian dialect are tokens of a single name, Bernart. By contrast, the other dialects, particularly Gascon, Languedoc, and Provençal, have a large number of different names in -art and, because many of these names are extremely common, a very large number of tokens of names in -art as well. While the Limousin dialect looks like its pattern of name derivation is very different, it is important to note how very few names derived from any of these suffixes appear in Limousin texts.

Because of the low occurrence of names derived from these suffixes, no conclusions can be drawn from them. It is clear, however, that the pattern of name derivation and creation differs greatly among the dialects.

In conclusion, then, the dialect in which a text is written affects the pattern of usage of derivational suffixes in important ways. First, as suggested by Adams and others, the derivational suffix -ivol does seem to be dialectally restricted. It occurs only in one dialect area ‒ the Waldensian dialect area ‒ and occurs very frequently and with a high level of productivity in texts from that dialectal region. In addition, the Waldensian dialect differs from the other dialects in its relative avoidance of both adjectives and names derived with -art, except for the name Bernart. Both adjectives and names in -art are very common in other dialects, patterning similarly in the central dialects, but are used much less frequently in Waldensian. In addition to the use of -ivol in adjectives, texts written in the Waldensian dialect showed a significantly higher use of adjectives and names in -enc than is found in the other dialects. There are also smaller differences between the Gascon, Languedoc, and Provençal dialects in the percentage of adjectives in

128

-enc and -esc used in each dialect, with Gascon using the most -esc and the least -enc and the Languedoc dialect using the most -enc and the least -esc.

3.5 Analysis of Derived Adjectives by Date of Text

In this section, I consider the use of the derivational suffixes by the date of the text. As in the analysis by date in Chapter Two, the texts were analyzed in groups of fifty years, and only non-lyric poetry and prose texts with moderately secure dates are included in this analysis. These decisions are discussed in Section 1.7.1. In each time period, all lemmas and tokens formed with the suffixes -al, -art, -enc, -esc, and -ivol were analyzed, though -al is not included in the discussion which follows.10 Table 3.9 shows the breakdown of lemmas derived using each suffix by the date of text. Figure 3.7 shows the percentage of total lemmas in each time period represented by each suffix.

10 An analysis of the lemmas and tokens in -al shows no difference in pattern across the time period of Old Occitan. The pattern of usage was the same across all time periods when considering adjectives, names, and all derived forms. 129

Table 3.9 Total Derived Lemmas by Date of Text

-art -enc -esc -ivol Total 11th c 1 0 5 0 6 16.7% 0% 83.3% 0% 1100-1150 15 3 2 0 20 75.0% 15.0% 10.0% 0% 1150-1200 16 7 0 0 23 69.6% 30.4% 0% 0% 1200-1250 59 34 15 0 108 54.6% 31.5% 13.9% 0% 1250-1300 74 38 13 2 127 58.3% 29.9% 10.2% 1.6% 1300-1350 88 77 32 57 254 34.6% 30.3% 12.6% 22.4% 1350-1400 68 24 16 15 123 55.3% 19.5% 13.0% 12.2% 1400-1450 40 39 3 63 145 27.6% 26.9% 2.1% 43.4% 1450-1500 42 56 5 44 147 28.6% 38.1% 3.4% 29.9%

100% 90% 80% 70% 60% art 50% 40% enc 30% esc 20% ivol 10% 0%

Figure 3.7 Percentages of Lemmas Derived with Each Suffix by Date of Text

130

If the date of the text had no effect on the pattern of use of these suffixes, we would expect to see consistent percentages across all time periods, as is in fact the case with the suffix -al. Instead, however, we see quite a bit of change across the Old Occitan time period with respect to the use of these derivational suffixes. The most immediately obvious trend is the development of the suffix -ivol, which is not found in any texts before 1250, but increases rapidly after that date. In Section 3.4 above, however, I confirmed that the use of -ivol is based on the dialect of the text. The trend shown here in its usage by the date of the text is merely an artifact of the dialect-based difference. Most of the Waldensian texts we have are from the latter half of the Old Occitan period, and when we examine closely which texts include words derived in -ivol, we find that they occur, as expected, only in texts written in the Waldensian dialect area, regardless of when they were written. The Nouveau Testament Vaudois de Zurich, a mid-fifteenth century text written in the Waldensian dialect, contains dozens of tokens of adjectives derived with -ivol, while other texts written in the same time period, including both texts with fairly secure locations such as the Languedoc text Thesaur de Pauvres + Appendice

Chantilly and texts for which the location is not at all secure, contain no attestations of the suffix -ivol at all. Based on this investigation, we can say that the use of the suffix

-ivol to derive adjectives is associated with the dialect of the text, and the appearance of a date-based trend is merely an artifact of the accidental distribution of texts, in which we have texts from the Waldensian region only after 1250.

The earliest time periods look as if they have an entirely different pattern of usage, but this is only because there are so few texts that can be securely dated to these

131

early periods. There are five lemmas derived with -esc in the earliest time period, all of which are from the same text, the Chanson de Sainte Foy. Because there are so few texts from this early time period, it is impossible to tell whether this more frequent use of adjectives derived with -esc is a feature of this time period or of this individual text.

Given the lack of any attestations of this suffix in the other texts of this time period, it could very well simply be part of the individual style of the author of the Chanson de

Sainte Foy. Similarly, there are very few forms derived with -art in this text, which is the longest text in this time period. The other texts, instead of using the suffix -esc, include several forms in -art, a pattern that is more like that of texts in the later, more robustly attested time periods. If there were more extant texts from this time period, we might be able to determine whether the different pattern of adjective derivation found in the

Chanson de Sainte Foy reflects a chronological difference or is simply a matter of the individual style of its author. With the small amount of data we have in the other short texts from the eleventh century, however, it seems best to treat the difference as one based on the style of the author of the Chanson de Sainte Foy rather than based on the date of the text.

There are other trends, though, which do seem to be genuinely based on the date of the text. There seems to be a steady, though not dramatic, decline in the use of words derived with -art during the Old Occitan period. While in the twelfth and thirteenth centuries, the majority of lemmas analyzed here are derived with -art, in the fifteenth century, the percentage of lemmas derived with -art has dropped to about 30%, approximately equal to those derived with -esc and lower than those derived with -enc, a

132

significant drop (in a chi-squared test for trend,11 χ2 = 38.257, p < 0.0001). Part of this trend is, like the dramatic increase in -ivol, an artifact of the dialect difference discussed above. The Waldensian dialect included a much smaller percentage of forms derived in

-art than the other dialects, and, as texts from this dialect are available only in the later time periods, the Waldensian pattern affects only these later time periods. While this does account for some of the drop-off in the usage of -art, it is not as definitive as in the case of the increase in -ivol. Unlike that seeming change, the drop in the frequency of -art in the later time periods is reflected in texts from other dialects or texts with no secure location, in addition to those from the Waldensian dialect. The date of the text thus does affect the pattern of usage of this suffix; its use decreases in the later part of the Old

Occitan period. In addition, there is a slight decrease in the percentage of words with

-esc. Though much less dramatic than the decrease in -art or the seeming increase in

-ivol, the decrease in lemmas with -esc is nonetheless significant (in a chi-squared test for trend, χ2 = 18.395; p < 0.0001).

When we consider the number of tokens of words derived with each suffix, we find a similar set of patterns, as shown in Table 3.10 and Figure 3.8.

11 Also called the Cochran-Armitage test for trend; see Agresti (1990). 133

Table 3.10 Total Derived Tokens by Date of Text

-art -enc -esc -ivol Total 11th c 1 0 8 0 9 11.1% 0% 88.9% 0% 1100-1150 108 4 7 0 119 90.8% 3.4% 5.9% 0% 1150-1200 52 8 0 0 60 86.7% 13.3% 0% 0% 1200-1250 946 70 31 0 1,047 90.4% 6.7% 3.0% 0% 1250-1300 721 215 30 2 968 74.5% 22.2% 3.1% 0.2% 1300-1350 600 244 73 267 1,184 50.7% 20.6% 6.2% 22.6% 1350-1400 424 52 57 17 550 77.1% 9.5% 10.4% 3.1% 1400-1450 231 387 3 365 986 23.4% 39.2% 0.3% 37.0% 1450-1500 237 236 12 181 666 35.6% 35.4% 1.8% 27.2%

100.0% 90.0% 80.0% 70.0% 60.0% art 50.0% 40.0% enc 30.0% esc 20.0% ivol 10.0% 0.0%

Figure 3.8 Percentages of Tokens Derived with Each by Date of Text

134

In many ways, the token frequencies present the same patterns through time as the lemma frequencies discussed above. For example, here we see the decrease in the frequency of forms in -art even more clearly; because the words derived in -art tend to have fairly high token frequencies, the drop in token frequency is more dramatic. One trend that comes out when we consider the token frequencies that was not obvious in the lemma frequencies is the increase in the use of words derived with -enc. Though the half-century from 1350-1400 is an anomaly, the token frequency of words in -enc otherwise rises significantly through the Old Occitan period (in a chi-squared test for trend, χ2 = 306.21, p < 0.0001). Like the increase in -art, however, this increase is again partially an artifact of the fact that texts in the Waldensian dialect are attested only after 1250. If the

Waldensian texts are removed from consideration, the remaining increase in -enc is not significant. Analyzing the use of forms derived with these suffixes by the date of the text does seem to show some changes in the pattern of derivation over time, though these changes are not as distinct as the dialect differences discussed in the previous section.

If we consider the adjectives and the names that are formed with each suffix separately, we find some of the same patterns. Table 3.11 shows the number and percentage of adjective lemmas and tokens in each dialect; the percentage of each is also shown in Figure 3.9.

135

Table 3.11 Adjective Lemmas and Tokens by Date of Text

Lemmas Tokens -art -enc -esc -ivol Total -art -enc -esc -ivol Total 11th c 0 0 5 0 5 0 0 7 0 7 0% 0% 100% 0% 0% 0% 100% 0% 1100-1150 0 0 1 0 1 0 0 7 0 7 0% 0% 100% 0% 0% 0% 100% 0% 1150-1200 14 3 0 0 17 20 3 0 0 23 82.6% 17.7% 0% 0% 87.0% 13.0% 0% 0% 1200-1250 34 11 14 0 59 104 19 28 0 151 57.6% 18.6% 23.7% 0% 68.9% 12.6% 18.5% 0% 1250-1300 61 18 10 2 91 289 65 27 2 383 67.0% 19.8% 11.0% 2.2% 75.5% 17.0% 7.1% 0.5% 1300-1350 63 44 31 61 199 276 137 71 271 755 31.7% 22.1% 15.6% 30.7% 36.6% 18.2% 9.4% 35.9% 1350-1400 23 14 15 15 67 178 47 45 17 287 34.3% 20.9% 22.4% 22.4% 62.0% 16.4% 15.7% 5.9% 1400-1450 22 29 3 63 117 54 65 3 365 487 18.8% 24.8% 2.6% 53.9% 11.1% 13.4% 0.6% 75.0% 1450-1500 26 36 3 44 109 114 86 10 181 391 23.9% 33.0% 2.8% 40.4% 29.2% 22.0% 2.6% 46.3%

136

Adjective Lemmas by Date 100% 90% 80% 70% 60% art 50% 40% enc 30% esc 20% ivol 10% 0%

Adjective Tokens by Date 100% 90% 80% 70% 60% art 50% 40% enc 30% esc 20% ivol 10% 0%

Figure 3.9 Percentages of Adjective Lemmas and Tokens with Each Suffix by Date of Text

In Table 3.11 and Figure 3.9, we see that some of the patterns found in the total derived forms are present here, but others are not. For example, the increase in the use of -ivol in the later time periods, though an artifact of the dialect differences, is clearly present, and

137

this increase is a stronger trend in token frequency than in lemma frequency. Similarly, the decline in the use of the -art suffix is clearly present, appearing at least as strong as in the derived forms as a whole. This tells us that the decline in the use of the -art suffix is not a general one, but specifically a decline in the use of -art to derive adjectives, and that the pattern of -art in deriving names may not be influenced by the date of the text. The increase in -enc discussed above, on the other hand, is slight when only adjectives are considered and is not significant in the lemmas and tokens of adjectives found in each time period.12

The decrease in the use of -art and -esc during the Old Occitan period can also be seen by looking at the forms in which there is alternation between two or more of these suffixes being added to a root, such as noitart and noitenc ‘relating to the night’.

Derivations of this root appear with -art early in the Old Occitan period but with -enc in later time periods. It is logical to assume that the form noitart, which is very transparent and has a low token frequency in the COM corpus, is parsed rather than stored. If this is the case, the use of noitenc rather than noitart in later texts suggests that the productivity of the suffix -art was declining and that the suffix was less available later in the Old

Occitan period for the production of new, or parsed, forms such as noitart. Similarly, though most of the token values for words derived with -esc are so low that conclusions cannot be drawn chronologically for any individual word, it is worth noting that, of the forms with show an alternation between derivations with two or more of the suffixes

12 Because some of the numbers are small, the chi-squared trend test is not appropriate here; using the Fisher’s exact test, however, neither the lemmas (p = o.3522) nor the tokens (p = 0.0542) are statistically significant. 138

discussed here, no tokens with -esc are found in fifteenth-century texts, though they account for more than 20% of the forms derived from these roots found in texts before

1300.

When we compare the date-related trends found in the adjectives to those found in all derived forms, it is noteworthy that the increase of forms derived with -enc found in the total derived forms was not present when only the adjectives were investigated. We might therefore expect to find a significant increase in -enc in the names found in the texts. Unsurprisingly, this is exactly what is found, as shown in Table 3.12 and Figure

3.10.

139

Table 3.12 Name Lemmas and Tokens by Date of Text

Lemmas Tokens -art -enc -esc Total -art -enc -esc Total 11th c 1 0 1 2 1 0 1 2 50.0% 0% 50.0% 50.0% 0% 50.0% 1100-1150 15 3 0 18 108 4 0 112 83.3% 16.7% 0% 96.4% 3.6% 0% 1150-1200 16 4 0 20 53 5 0 58 80.0% 20.0% 0% 91.4% 8.6% 0% 1200-1250 54 23 2 79 899 51 3 953 68.4% 29.1% 2.5% 94.3% 5.4% 0.3% 1250-1300 49 22 4 75 486 150 5 641 65.3% 29.3% 5.3% 75.8% 23.4% 0.8% 1300-1350 58 42 1 101 360 107 2 469 57.4% 41.6% 1.0% 76.8% 22.8% 0.4% 1350-1400 49 12 2 63 246 19 12 277 77.8% 19.0% 3.2% 88.8% 6.9% 4.3% 1400-1450 20 10 0 30 177 22 0 199 66.7% 33.3% 0% 88.9% 11.1% 0% 1450-1500 20 27 2 49 123 55 2 180 40.8% 55.1% 4.1% 68.3% 30.6% 1.1%

140

Name Lemmas by Date 100% 90% 80% 70% 60% 50% art 40% enc 30% 20% esc 10% 0%

Name Tokens by Date 100% 90% 80% 70% 60% 50% art 40% enc 30% 20% esc 10% 0%

Figure 3.10 Percentages of Name Lemmas and Tokens with Each Suffix by Date of Text

The increase in the usage of -enc to derive words that was only slight in the adjectives is clearly significant in the pattern of names (for lemmas, using a chi-squared trend test,

χ2 = 24.481, p = 0.0009; for tokens, χ2 = 50.472, p < 0.0001). With this increase, there is

141

a corresponding decrease in the use of names in -art, as is expected with percentages.

Though the trend looks more consistent in the name lemmas than it does in the name tokens, this is only because of the very high token frequency of names in -art. This increase in names in -enc, however, is largely a reflection of the dialect differences discussed in the previous section. The Waldensian dialect, of which texts only occur in the later time periods, was found to have a different pattern of names, including a much smaller percentage of names in -art and a much larger percentage of names in -enc than the other dialects. The influence of this dialect is particularly strong in the last time period, where the longest text dated to this time period is the Waldensian Nouveau

Testament Vaudois de Zurich. In this text, there are a large number of names for groups of people that are derived in -enc, such as Sodomienc, Israellitienc, Levitienc, Ciprienc, and Cirinienc, and very few names derived with -art. While the dialect does not account for all of the variation in the pattern of names in -art and -enc, it accounts for a great deal of it. What change in the pattern of names created with this suffix is left unaccounted for by the dialect is, while interesting, very slight, particularly in the case of -art.

When we analyze the usage of forms derived with these suffixes in terms of the date of the text, there is only one strong trend that seems to be clearly associated with the date of the text: the decline in the use of adjectives derived with -art; there is also an additional weaker date-related trend in the decrease of the forms derived with -esc. The other apparently strong trends shown here can be traced back to dialect differences, particularly with regard to the Waldensian dialect.

142

3.6 Analysis of Derived Adjectives by Text Type

Finally, in this section I consider the patterns of use of the suffixes -art, -enc, -esc, and -ivol13 in terms of the type of text, separating lyric poetry, non-lyric poetry, and prose using Ricketts’s divisions of these texts into types for the tranches of the COM as described in Section 1.5. Table 3.13 and Figure 3.11 show the number and percentage of lemmas and tokens found with each suffix.

Table 3.13 Total Derived Lemmas and Tokens by Text Type

Lemmas Tokens -art -enc -esc -ivol Total -art -enc -esc -ivol Total Lyric 42 18 18 0 78 303 32 50 0 385 Poetry 53.8% 23.1% 23.1% 0% 78.7% 8.3% 13.0% 0% Non-lyric 70 41 25 14 150 1,405 225 69 16 1,715 Poetry 46.7% 27.3% 16.7% 9.3% 81.9% 13.1% 4.0% 0.9% Prose 254 255 87 89 685 4,947 1,071 298 830 7,146 36.9% 37.2% 12.7% 13.0% 69.2% 15.0% 4.2% 11.6%

13 The suffix -al was also analyzed in terms of text type and, as with the other parameters, there was no difference in the pattern of its use. 143

Lemmas by Text Type Tokens by Text Type 100% 100%

80% 80%

60% art 60% art enc enc 40% 40% esc esc 20% 20% ivol ivol 0% 0% Lyric Non-lyric Prose Lyric Non-lyric Prose Poetry Poetry Poetry Poetry

Figure 3.11 Percentages of Total Derived Lemmas and Tokens with Each Suffix by Text Type

Just as in the previous sections, we do find clear differences in the patterns of usage of these suffixes. And, just as in Section 3.5, the very clear difference in the usage of the suffix -ivol is not actually a text type difference but is, again, a reflection of the dialect difference introduced in Section 3.4. The -ivol suffix occurs primarily in the Waldensian dialect, and the text type difference in the usage of this suffix is simply an artifact of that dialect difference. The lyric poetry does not, in general, have a secure regional location, partially because of the manuscript problems introduced with reference to geography in

Section 1.7.2 and partially because it seems to be deliberately written in an “artistic language” (Jensen 1995). Those who have tried to tie the lyric poetry to a geographic region have determined that most of the features found in the lyric poetry are most closely related to the Languedoc dialect (Jensen 1995, Smith 1995), but features of other dialects are found throughout the poems. Given the intentional association of the lyric poetry with the central dialects and the culture they represent, it is not surprising that

144

there are no instances of adjectives in -ivol present in the lyric poetry. In addition, given the transparency of the attested occurrences, it is possible that the use of -ivol as a productive affix may be a recent development in the Waldensian dialect during the Old

Occitan period. Because the poetry was composed relatively early in the Old Occitan period, later developments of adjective-deriving suffixes would not have been available to poets.

It also looks as though -ivol is more frequently used in prose than in the non-lyric poetry, but this is again only an artifact of the dialect difference. There are several very long prose texts and only one non-lyric poetry text in the Waldensian dialect, and it is the fact that there is a larger proportion of Waldensian texts in the prose than in the non-lyric poetry that gives this effect, rather than a difference in usage directly related to text type.

This becomes clear when we examine the particular texts in which the suffix -ivol is used.

It is used almost exclusively in texts from the Waldensian dialect area, and the few other texts in which the suffix is used are of uncertain geographic origin and may well be from the Waldensian dialect region.

There are, however, other important differences that seem to be based on the text types themselves rather than being artifacts of other differences. For example, the suffix - esc is used far more frequently in the lyric poetry texts than in the non-lyric poetry and prose texts. This difference, while more highly significant in the token frequency of -esc

(p < 0.0001), is also significant in the lemma frequency (p = 0.0032). The prose texts use the fewest adjectives in -esc; this trend is also statistically significant for the adjective tokens when tested against the lyric and non-lyric poetry together (p = 0.0012), but is not

145

statistically significant for the lemmas (p = 0.0282). The non-lyric poetry, however, is not significantly different from the other types of texts (for lemmas, p = 0.3709; for tokens, p = 0.3116). Because the non-lyric poetry texts use more words in -esc than the prose texts but fewer than the lyric poetry texts, it is not surprising that it is not statistically different from the combination of the two text types.

When considered together, the three-way difference in the distribution of adjectives in -esc is significant for the tokens (p < 0.0001), though not for the lemmas of

-esc (p = 0.0296). This text type difference may have its roots in the relatively low frequency of the suffix -esc in the Old Occitan corpus as a whole, as the lyric poets were known for word play and may have used less common forms deliberately.

The other important difference partly based on text type is in the use of -art and

-enc. Words derived with the suffix -art account for the largest percentage of lemmas and tokens in the lyric poetry text, followed by the non-lyric poetry text, and finally the prose texts. This difference in the use of -art between the three text types when tested together is significant (for lemmas, p = 0.0033; for tokens, p < 0.0001). Exactly the opposite pattern is that found with -enc: forms derived with -enc are used most often, both in terms of lemma frequency and in terms of token frequency, in the prose texts, followed by the non-lyric poetry texts, and finally the lyric poetry texts (for lemmas, p = 0.0062; for tokens, p = 0.0004). Some of this difference is again a reflection of the Waldensian dialect and is discussed below with the names.

When we separate the adjectives from the names in each suffix, we find quite different patterns in the adjectives, as shown in Table 3.14 and Figure 3.12.

146

Table 3.14 Adjective Lemmas and Tokens by Text Type

Lemmas Tokens -art -enc -esc -ivol Total -art -enc -esc -ivol Total Lyric 18 13 17 0 48 84 25 49 0 158 Poetry 37.5% 27.1% 35.4% 0% 53.2% 15.8% 31.0% 0% Non-lyric 28 22 23 14 87 221 46 65 16 348 Poetry 32.2% 25.3% 26.4% 16.1% 63.5% 13.2% 18.7% 4.6% Prose 103 106 58 89 356 907 530 224 830 2491 28.9% 29.8% 16.3% 25.0% 36.4% 21.3% 9.0% 33.3%

Adjective Lemmas by Text Type Adjective Tokens by Text Type 70% 70% 60% 60% 50% 50% art 40% 40% art enc 30% 30% enc esc 20% 20% esc ivol 10% 10% ivol 0% 0% Lyric Non-lyric Prose Lyric Non-lyric Prose Poetry Poetry Poetry Poetry

Figure 3.12 Percentages of Adjective Lemmas and Tokens with Each Suffix by Text Type

With the exception of the difference in the use of -ivol, the pattern of adjective lemmas derived with each suffix seems mostly stable across text types, showing a significant difference only in the use of -esc. As is expected from the preceding discussion of total derived forms, the largest percentage of adjective lemmas with -esc occurs in the lyric

147

poetry, followed by the non-lyric poetry, followed by the prose. -esc is used significantly more frequently in the lyric poetry than the other two text types (p = 0.0075). It is used significantly less frequently in the prose texts than in the other two text types

(p = 0.0015). The non-lyric poetry, as in the case of the total derived forms discussed above, is not significantly different (p = 0.1045) because its pattern falls between that of the prose and the lyric poetry. When considered together, the three-way difference in the use of -esc is statistically significant (p = 0.002).

The adjective tokens, however, are less consistent across the text types. The suffix

-art has a much higher token frequency in the non-lyric poetry than it does in the prose, and the suffix -enc, along with -ivol, accounts for a much larger percentage of the tokens in the prose than in either of the other text types. The association of -esc with poetry, and with lyric poetry in particular, is even more highly statistically significant in the adjective tokens than in the adjective lemmas. The high usage of -esc in the lyric poetry and the low usage of -esc in the prose are both highly statistically significant (p < 0.0001 in both cases), and the three-way contrast is as well (p < 0.0001).

In the names, on the other hand, this association of -esc with the lyric poetry does not occur. There are relatively few names in -esc in any of the text types, though they are actually most common in the prose. This clearly shows that while the pattern of adjective derivation with the suffix -esc shows a fairly strong association with the lyric poetry, this is not true of name creation. In fact, although the association of adjectives in -esc with lyric poetry is one of the most striking text type differences, text type seems to make no

148

difference at all for names in -esc (p = 1). The number and percentage of lemmas and tokens derived with each suffix is shown in Table 3.15 and Figure 3.13.

Table 3.15 Name Lemmas and Tokens by Text Type

Lemmas Tokens -art -enc -esc Total -art -enc -esc Total Lyric 13 6 1 20 219 7 1 227 Poetry 65.0% 30.0% 5.0% 96.48% 3.08% 0.44% Non-lyric 5 21 2 28 1,184 179 4 1,367 Poetry 17.9% 75.0% 7.1% 86.61% 13.09% 0.29% Prose 171 169 31 371 4040 541 74 4,655 46.1% 45.6% 8.4% 86.79% 11.62% 1.59%

Name Lemmas by Text Type Name Tokens by Text Type 100% 100%

80% 80%

60% 60% art art 40% enc 40% enc

20% esc 20% esc

0% 0% Lyric Non-lyric Prose Lyric Non-lyric Prose Poetry Poetry Poetry Poetry

Figure 3.13 Percentages of Name Lemmas and Tokens with Each Suffix by Text Type

Here, we find a very different pattern from that of the adjectives. Aside from the complete lack of a correlation between the lyric poetry and -esc in names, the most striking difference is the large percentage of different names in -enc found in the non-

149

lyric poetry. The percentage in the lyric poetry is significantly higher than in the other two text types (p = 0.0027); the use of -enc in the non-lyric poetry is also significantly different from that of the lyric poetry (p = 0.0040), though the prose is not statistically different from the two types of poetry (p = 0.1700). This lack of statistical significance is unsurprising as the proportion of words derived with -enc in the prose is between that of the lyric poetry and non-lyric poetry. The three-way contrast in the distribution of names with -enc between the lyric poetry, non-lyric poetry, and prose is also statistically significant (for lemmas, p = 0.0055; for tokens, p < 0.0001) and, unlike the apparent difference in names with -enc based on date, cannot be correlated with the dialect difference at all. The dialect difference regarding this suffix was its association with the

Waldensian dialect, but the majority of our texts in the Waldensian dialect are prose texts.

If the differences we see here were merely a reflection of the dialect differences, we would expect to see the largest percentage of names in -enc in the prose texts; instead, we find them in the non-lyric poetry texts. Thus, while the dialect difference may account for some of the differences in the pattern of -enc between the prose and the lyric poetry, as it is found in more names in the former than in the latter, there is a further difference, as there are more names in -enc in the non-lyric poetry than in the prose and lyric poetry.

In addition to the date and the dialect of the text, the type of text does seem to influence the pattern of adjective usage in noticeable ways; specifically, adjectives with

-esc occur most often in the lyric poetry and names in -enc are most common in the non- lyric poetry.

150

3.7 Conclusion

In this analysis, all three parameters examined were found to have some impact on the pattern of the use of the suffixes -art, -enc, -esc, and -ivol. By far the most pervasive are dialect differences, particularly those which were found between the

Waldensian dialect and the rest of the Old Occitan dialects. I have confirmed that the use of -ivol is indeed dialectally constrained. The use of -ivol exclusively in the Waldensian dialect and the preference of adjectives and names in -enc instead of -art create a very different pattern of usage based on the dialect of the texts.

Although less striking, the date of the text also impacts the pattern of usage of these adjectives, particularly the usage of adjectives with the suffix -art. Adjectives with this suffix, while still quite common, steadily become considerably less frequent through the second half of the Old Occitan period. Similarly but to a lesser extent, words derived with -esc also become less frequent over time.

Finally, the type of text affects the pattern of adjective derivation in two ways.

First, adjectives derived in -esc, though not names, are significantly more common in the lyric poetry, perhaps reflective of the lyric poets’ appreciation for less common words and for creating new words. Second, names formed with -enc are significantly more frequent in the lyric poetry.

In Chapter Two, only the type of text was found to account for the pattern of usage of the analytic and synthetic formations of the comparative adjectives. In this analysis of adjective derivation, however, not only the text type but also the date and the

151

dialect of a text give important insights into the usage of these derivational suffixes.

While many of these differences are very subtle, they nonetheless give a more complete picture of how these derivational suffixes are used in Old Occitan.

152

Chapter Four: Glide-Initial Diphthongs

4.0 Glide-Initial Diphthongs

After considering the text type differences found in the usage of an aspect of the morphosyntax and an aspect of the morphology of Old Occitan, I turn now to the text type differences that can be found in a phonological feature: the use of the glide-initial diphthongs. In Old Occitan, glide-initial diphthongs are found in words where a mid vowel would be etymologically expected. In many cases, forms with diphthongs occurred alongside those with simple vowels. Examples of words in which developments of [je] from /e/ occur are given in (1).

(1) mieg ‘half’ mielh ‘better’ brievitat ‘soon’ ieu ‘I’ gielar ‘to freeze’ dieu ‘god’ mieis ‘mine’ tiemper ‘storm’ gienh ‘people’ parielha ‘together’

153

The glide-initial diphthongs that developed from back mid vowels had two outcomes:

[we] and [wo] . Examples of this diphthongization of back mid vowels can be seen in (2).

(2) luec ‘place’ suegre ‘father-in-law’ fuoc ‘fire’ uelh ‘eye’ tueys ‘poison’ fuelh ‘leaf’ luoctonen ‘deputy’ vuolp ‘wolf’ nueu ‘new’

4.1 Glide-Initial Diphthongs in Grammars and Descriptions of Old Occitan

The development of these diphthongs is generally agreed to have occurred late in the development of Old Occitan, just before the Old Occitan period – one of the last changes in the progression from Vulgar Latin to Old Occitan. Grafström (1958) argues, on account of the fact that no diphthongs occurred (at least orthographically) in the

Canson de Sainte Foy, and only one (uel ‘eyes’) is found in the Boeci (1000 A.D.), that the diphthongization was underway in the eleventh century. The eleventh century, however, seems slightly too late to account for the presence of diphthongs in the troubadour poetry. In even the earliest poems, just after 1000 A.D., the diphthongs appear consistently in many words (Paden 1998). Thus, despite Grafström’s evidence, many scholars agree that the diphthongization must have occurred before 1000 A.D;

154

Grandgent, for example, ascribes the development to the period between the seventh and tenth centuries “with some confidence” (p. 20).

The glide-initial diphthongs have received a great deal of attention, not only in studies on Old Occitan, but also in studies of other Romance languages. Changes similar in description, but less troublesome in analysis, occurred in other branches of Western

Romance. In Spanish, /e/ became [je] and /o/ became [we] in stressed syllables almost without exception. In French and Italian, the mid vowels diphthongized (with different outcomes in the two languages) in open stressed syllables, again, almost without exception. And in Catalan, which was phonologically very similar to Old Occitan, /e/ became [je] before a palatal and /o/ became [we] only before //. The development of these very similar changes in closely related languages suggests some variability or at least vulnerability in the source language, leading to “drift” among the daughters (Croft

2006, Joseph 2006). Penny (2002) has proposed that the Spanish diphthongization may have occurred due to Germanic influence on the length and quality of tonic vowels which were reinterpreted as diphthongs, and this could be extended to the other Romance languages in question, all of which had contact with Germanic languages.

The development of the glide-initial diphthongs in Old Occitan is less clearly explained, and there is no consensus among Romance linguists. For example,

Mendeloff’s Manual of Comparative : Phonology and Morphology states that the mid vowels did not diphthongize in Old Occitan,1 but even a cursory

1 It is not clear why Mendeloff makes this claim or what the basis for it is. 155

glance at the Old Occitan texts will reveal the presence of the glide-initial diphthongs in some words.

Many linguists, recognizing that these diphthongs occurred in some but not all words, have suggested various conditioning environments and explanations for the development of the diphthongs. Matzke (1898) argues that glide-initial diphthongs developed in /e/ and /o/ in open syllables as a form of lengthening. He discusses the syllabification rules of early French and Occitan at length, exploring the exact conditions which resulted in an open syllable and thus a diphthong. He does not, however, discuss monosyllabic words, many of which are closed syllables which occur frequently with diphthongs ‒ mielhs ‘better’, ciel ‘heaven’, and luenh ‘far’, for example ‒ and there are many multi-syllabic words in which diphthongs developed in closed syllables according to his definition. In another early analysis, Tuttle (1919) argues that diphthongs developed in mid vowels when followed by /i/ or /u/, as either vowels or glides, in either the same syllable or the following syllable, as a form of vowel height harmony. While this analysis accounts for some of the most common forms, such as ieu ‘I’ and nientir ‘to deny’, there are many words in which diphthongs occurred without a following high vowel or glide, for instance, luec ‘place’.

Other scholars have used the division of the Old Occitan mid vowels into high mid vowels and low mid vowels to advantage in trying to account for the words that develop diphthongs. Old Occitan continued the seven-vowel system of Late Latin, which had four levels of vowels, as schematized in (3)

156

(3) i u e o ɛ ɔ a

Paden (1998) argues that only the low mid vowels (/ε/ and /ɔ/) developed into diphthongs universally before glides. Others argue that diphthongs developed from the low mid vowels in stressed syllables only in content words, mostly nouns and verbs. On the other hand, Boyd-Bowman (1980) notes that it is the high mid vowels (/e/ and /o/) rather than the low mid vowels that developed diphthongs before palatal consonants.

Alibert (1935) also argues that the diphthong development was conditioned by following palatal sounds, but explicitly states that all mid vowels, both the high mid vowels and the low mid vowels, developed diphthongs in this context. In a later work, he expands the conditioning environment to include both the glides /j/ and /w/ and the velar stops /k/ and /g/ alongside the palatals (Alibert 1965). Harris and Vincent’s (1988) analysis is similar in its conditioning by following palatals, but they argue that the diphthongization of the mid vowels was optional. Meyer-Lübke (1923) discusses the influence of palatal and velar sounds both preceding and following the vowel as triggering the change.

Grandgent introduces the problem of dialect in his analysis. In his 1919 grammar, he explains that diphthong formation occurred dialectally before some combination of /i/,

/u/, and the palatal consonants, but that there was no coherent set of conditioning criteria

157

for the language as a whole. He also notes that the diphthongization was least common in the southwest of the region, in the Gascon dialect area.

Dialect has also been argued to be important for determining the outcome of the back diphthong; the Gascon, Limousin, and Auvernhat dialects preferred the [wo] variant, while both this variant and the fronted [we] variant have been noted for the

Languedoc and Provençal dialects (Anglade 1977). The pattern of usage of the [wo] and

[we] variants in the Old Occitan texts is considered in Section 4.3.3.1 below, and the role that dialect plays in this distribution is discussed in Section 4.4.

Many other grammars, handbooks, and papers have given conditioning environments for the development of these diphthongs similar to one or more of those summarized here. Many of them have focused on the proximity of palatal sounds, but others have simply noted in passing that the diphthongs occur in some words and not in others. While this latter explanation may be true synchronically, it is not very satisfactory as an account. Taken together, however, the other accounts are not satisfactory either, as they do not truly account for the data. Further, the contradictions between these explanations show, as perhaps the description of the problem does not, that there is something both problematic and interesting here.

While it is not the purpose of this thesis to add yet another account to the discussion, in my own study of the development of the glide-initial diphthongs (Wilson

2010), I found that diphthongs did develop from both low mid vowels and high mid vowels (dieu ‘god’ and frieg ‘cold’). Diphthongs are found in both content words and function words (tueys ‘poison’ and depueis ‘after’). Also, diphthongs developed in both

158

open syllables and closed syllables (muelhar ‘mother’ and fuelh ‘leaf’), and before and after every sound in Old Occitan.

Despite this, the development of the diphthongs was not random. The diphthongs developed with a core of regularity, in which almost all of the candidates for the change developed diphthongs. This core of regularity was narrowly defined: the low mid vowels

(/ε/ and /ɔ/) developed diphthongs in stressed monosyllabic words when followed by a palatal sound, such as fuec ‘fire’, uelh ‘eye’, and mieg ‘half’. Over 95% of the candidate words appear with a diphthong in the Old Occitan corpus. I have elsewhere advocated a

“Big Bang Theory” approach (Janda and Joseph 2003) to explain this core of regularity and how the diphthongization spread to phonologically similar environments and morphologically related words in a principled way (Wilson 2010). The core regularity, which is similar to the conditioning environments suggested by Paden, Meyer-Lübke, and others, represents the sound change itself, the “Big Bang”. The other words which developed diphthongs in the aftermath of this core sound change, however, cannot be ignored.

From this core of regularity, the development of glide-initial diphthongs spread through the lexicon by various avenues of analogy with words which were part of the

“Big Bang”; in fact, the number of words which developed diphthongs by analogy is much larger than the number of words in the core of regularity. In some cases, the analogy involved words with similar sounds, which had the effect of a seeming relaxation of the narrow conditioning environment of the original change. The levels of analogy described here are not intended to be a chronological development, but rather an ordering

159

in terms of the regularity of the outcome and nearness to the core in terms of stringency of the conditioning factors. The reality of the spread of the glide-initial diphthong development through the lexicon, however, need not have been so regimented. It is quite possible, for instance, that if the analogical spread of the development is conceptualized as concentric circles, the spread of one side of the circle bulged outward quickly, following related words or derivational analogy, and then slowed while other parts of the circle caught up.

Outside the “Big Bang” core, a large percentage of the high mid vowels in the same conditioning environment – occurring in a stressed monosyllabic word before a palatal sound – developed diphthongs by analogy with the words in which the low mid vowels developed diphthongs, such as frieg ‘cold’. This development, however, was not a regular sound change and there are many exceptions. In addition, the mid vowels – both low mid vowels and high mid vowels – developed glide-initial diphthongs before sonorant consonants, presumably by analogizing the diphthongs before dental and labial sonorants to the palatal sonorants. This avenue of analogy gave diphthongs in words such as bien ‘good, well’ and miel ‘honey’. Only about half of the words in which the mid vowels occur in this environment developed diphthongs. At a similar “distance” from the core of regularity and a similar proportion of words undergoing the change, stressed mid vowels followed by palatal sounds developed diphthongs in multi-syllabic words, such as perfiech ‘perfect’ and ueimais ‘from now on’. Again, by analogy, a smaller percentage of stressed mid vowels followed by sonorants in multi-syllabic words, such as arcangiel

‘archangel’ and tiemple ‘temple’, developed the initial glide. Finally, furthest from the

160

central core of regularity, a much smaller proportion of mid vowels in open syllables developed into diphthongs, such as gliera/glieja ‘church’ and suegre ‘father-in-law’. In addition, a number of forms which were morphologically related to forms in the core of regularity developed glide-initial diphthongs by analogy or paradigm leveling, regardless of the phonological shape of the word; for example, mielher ‘better’ occurs with a diphthong by analogy to mielhs ‘best’.

The “Big Bang” can thus be used to account for the development of diphthongs in particular words. Using this approach, the core of regularity can be identified as the earliest change, and the various avenues of analogy, based primarily on similarity of sound sequences and morphologically related words, can be explored. This approach, like the explanations of others described above, does not fully and accurately describe the problem of the glide-initial diphthongs in Old Occitan. Even though the development of the glide-initial diphthongs in particular words is accounted for, the issue of the actual occurrence of forms with diphthongs in the texts remains to be considered, and an analysis of the usage of the glide-initial diphthongs may shed more light on an issue that has troubled Romance linguists for decades.

Although he gives an account of the development of the diphthongs and examples of words in which the diphthongs occur, Paden (1998) also notes that “scribes employed these diphthongs in a seemingly random fashion” (p. 11). He suggests that either the spellings with monophthongs rather than diphthongs actually represent diphthongized pronunciations or that speakers “pronounced either the simple vowel or the diphthong indifferently” (p. 103). Others, however, have suggested that Old Occitan was

161

pronounced more or less exactly as written (e.g. Wayland 1982). Though Paden notes that the possibility of speaker variation is less satisfactory, it is this issue that I focus on in the remainder of the chapter.

We know from modern sociolinguistic studies that such variation does exist in modern languages; well-known studies by Labov and many others have shown variation in pronunciation based on regional dialect, socio-economic class, age, gender, and other factors. The uniformitarian principle (Labov 1972) suggests this is possible of languages such as Old Occitan as well. Though information on parameters such as age and socio-economic class is not available for speakers of medieval languages, the rest of this chapter analyzes the patterns in which the glide-initial diphthongs are found in the texts by some of the factors for which the information is available. The distributions of the diphthong and monophthong variants are analyzed by dialect, date, and text type in

Sections 4.4, 4.5, and 4.6 respectively in order to explore to what extent the use of one variant or the other can be explained by these parameters rather than simply being random.

4.2 The Multiple Manuscript Problem

One very important concern when considering phonology in manuscripts is the variation between manuscripts. Even where there are dozens of manuscripts of a text, the

COM gives a single critical edition, washing out a great deal of the variation. Many of the texts, particularly the prose texts, come down to us in only one manuscript, but others occur in multiple manuscripts. This is particularly true of the lyric poetry, as selections of

162

lyric poetry were reproduced in dozen of chansonniers, or troubadour songbooks, but is also true of some other texts, such as the Breviari d’Amor, pieces of which occur in fourteen different manuscripts.

Regardless of whether all manuscript versions of the poems are identical or vary wildly, only one version is given in the COM, as discussed in Section 1.3.3. Ricketts has announced an intended fourth part of the Concordance de l’Occitan Médiéval which will include all of the manuscripts for each lyric poem, but this part of the project has not yet been completed. For the time being, at least, the COM gives a single reading of the vast majority of the Old Occitan texts,2 without the benefit of a critical apparatus in which to discuss important differences between manuscripts.

In his 2008 thesis, Massimiliano de Conca considered all of the manuscript versions of the poems of and developed his own diplomatic editions of these texts. The published thesis includes the full text of every poem in every manuscript.

For example, one of the poems attributed to Arnaut Daniel, Si·m fos Amors de joi donar tant larja, is attested in twenty different manuscripts, none of which give identical readings of the poems at the word level, much less at the levels of spelling or phonology.

The sixth cobla, or stanza, of the poem as it occurs in the COM is given in (4).

2 There are, however, a few texts, such as the Regles de Trobar, for which two manuscript versions of the text are included in the COM. 163

(4) Fals lausengiers, fuocs la lenga vos arja e que perdatz los oillz ams de mal cranc, que per vos son estrait caval e marc: Amor toletz qu’ab pauc del tot no tomba; confonda·us Deus, que ja no sapchaz com vos faitz als drutz maldir e viltener: mals astres es qui·us ten desconoissenz, et es pejor on hom vos amonesta. (PC 029 017)

False liars, may fire burn your tongues and may you lose both your eyes to cancer; may horses and brands be lost on account of you you who take away love so that it almost falls away entirely God confounds you, and I do not know why you make lovers curse and despise you; it is an evil star that holds you unknowing, and it is worse when someone tries to help you.

There are several words in the cobla in (4) in which the glide-initial diphthongs occur in

Old Occitan: fuocs, oillz, Deus, and pejor. Only one of these, fuocs, occurs with the diphthong in this passage; the other three words occur here with monophthongs.

When we consider the different manuscripts, however, the question of the use of the glide-initial diphthongs is much less clear. Figure 4.1 shows de Conca’s breakdown of how this cobla is written in each of the nineteen texts in which it occurs.3

3 Although the poem in attested in twenty manuscripts, the sixth cobla is not included in Ms. f. 164

Ms. A Ms. H 41 Fals lauzenier, fuecs las lengas vos arga 41 Fals lausengiers, foc las lenguas vos arga 42 e que perdatz los huoills amdos de cranc, 42 o qe perdatz ams los oillz de mal cranc, 43 que per vos son estraich cavail e marc, 43 qe per vos son estrait caval e marc, 44 amor toletz c’ab pauc del tot non tomba; 44 amor tolletz ’ab pauc del tot non tomba; 45 confonda·us Dieus que ja non sapchatz com 45 confonda·us Deus e sai vos dire com, 46 qu·us faitz als drutz mal dir e vil tener: 46 q’al drutz vos faitz mal dir e vil tener, 47 malastres es qe·us ten desconoissens, 47 e per vos es casutz prez e jovenz 48 car pejor etz qui plus vos amonesta. 48 et es pejor on plus vos amonesta.

Ms. B Ms. I 41 Fals lausenier, focs las lengas vos arga 41 Fals lausengiers, fuocs la lenga vos arga 42 e que perdatz los huoills amdos de cranc, 42 o que perdatz tuch los oills de mal cranc, 43 que per vos son estraich cavaill e marc, 43 car per vos son estraich caval e marc, 44 amor toletz c’a pauc del tot non tomba; 44 amor tolles c’a pauc del tot non tonba; 45 coffonda·us Dieus que ja non sapcbatz com 45 confonda·us Dieus, e sai vos dire com 46 qe·us faitz als drutz mal dir e vil tener: 46 c’als drutz vos faitz mal dir e vil tener: 47 malastres es qe·us ten desconoissens, 47 e per vos es cazutz pretz e jovenz 48 car pejor etz qui plus vos amonesta. 48 et est pejor que plus vos n’amonesta.

Ms. L Ms. K 41 Fals lausengers, fuocx la langa vos arja, 41 Fals lausenzier, fuocs la lenga vos arga 42 o qe perdatz los oillz ab u mal canch, 42 o que perdatz tuich los oills de mal cranc, 43 e qe fossaz tuit ferit de mal cranc, 43 car per vos son estraich caval e marc, 44 qar per vos so estrat caval e marc, 44 amor tolles c’a pauc del tot non tomba; 45 amor tolesz q’ab pauc del tot non tomba; 45 confonda·us Dieus, e sai vos dire com 46 confonda·os Dieus, qe ja non sapchasz com 46 c’als drutz vos faitz maldir e vil tener 47 vos faich als drutz mal dir e vil tener, 47 e per vos es cazutz pretz e jovenz 48 malsastres qi·os ten desconoisseinz 48 et est pejor que plus vos n’amonesta. 49 qe pezors es c’om plus vos amonesta.

Ms. Q Ms. N2 33 Fals lausengers foc las lengas vos arda 41 Fals lausengiers, fuocs las lengas vos arga 34 au qe pregaç toç los oilç de mal cranc 42 o que perdatz tuich los oils de mal cranc, 35 qe per vos son estrait caval e marc 43 car per vos son estraich caval e marc, 36 c’amor toleç q’a pauc del tot nom 44 amor tolles c’a pauc del tot non tomba: 37 confunda·os Deus e ja non sabreç cum 45 confonda·us Dieus, e sai vos dire com 38 vos faiç alç druç mal dir ni vil tener 46 c’als drutz vos faitz maldir e viltener, 39 malsastres es qi osten desconoissenç 47 e per vos es cazutz pretz e jovenz, 40 qe peior es qant hom vos ammonesta. 48 et est pejor qui vos n’amonesta.

Ms. D Ms. V 41 Fals lausengers, focs la lengua vos arda 33 Fals lausengiers fuecs las lengas vos arja 42 o que perdaz los oillz tuich de mal tanc, 34 o qe perdatz ambs los oilz de mal cranc 43 que per vos son estrait caval e marc, 35 car per vos son estrat cavalz e marc 44 amors tolez c’ap pauc del tot no·m tomba; 36 amor tolez q’a pauc del tot non tomba 45 cofonda·us Deus que ja no sap bez co 37 confonda·us e sai vus dir com 46 vos faiç al druz mal dit e vil tener 38 qe·us faz als drutz mal dir e ver tener 47 e per vos es cazuz prez e jovenz 39 car per vos es pretz cazutz e jovenz 48 que peior es can om vos amonesta. 40 et es peior cant hom vos n’amonesta.

Figure 4.1 Cobla VI of Si·m fos Amors de joi donar tant larja (de Conca 2008, p. 205‒6) (continued) 165

Figure 4.1 Continued

Ms. P Ms. C 36 Fals losengiers foc las lenguas vos arja 33 Fols lauzengiers, fuec las lenguas vos arga 37 et qe fossaz tuit ferit de mal cranc 34 e que perdatz ams los huelhs per mal cranc, 38 car per vos son estrat cavals e marc 35 que per vos son estrag caval e marc, 39 amor tolez c’a pauc de joi non tomba 36 qu’amor baissatz qu’a pauc del tot no plomba; 40 confonda·us Deus car ges non sabez com 37 cofonda·us Dieus e sai vos dire com 41 ves faiz als druz mal dir e vil tener 38 vos faitz als drutz mal dire e vil tener: 42 malsastres es qe·us teng desconoiscenz 39 malsastres es qui·us sec desconoissens, 43 car pejors es com plus vos amonesta. 40 que piegers es qui plus vos amonesta,

Ms. U Ms. M 41 Fals lausengier focs la lenga vos arja 33 Fals lauzengier, fuecs las lengas vos arja 42 o qe perdaz los oilz ab un mal canc 34 e qe perdas ams los hueilhs de mal cranc, 43 qe per vos son estrait caval e marc 35 qar per vos son estrainh cavall e marc, 44 amor tollez q’a pauc de tot non tomba 36 e vos amor tolles q’a pauc non tomba, 45 confonda·us Dieus qe ja non sabes com 37 confonda·us Dieus, e sai vos dire com 46 vos faiz als druz mals dir e vil tener 38 qe·us fassa·ls drutz maldir e villtener, 47 malastres es que·us ten desconoiscenz 39 quar per vos es cazutz prez e jovenz 48 qe pegier est qant hom vos amonesta. 40 e valez mens qant hom vos amonesta.

Ms. F Ms. R 25 Fals lausengiers, focs las legnas vos arja 41 Fals lauzengiers 26 o qe perdatz ambs los oillz de mal cranc, 42 e que preguatz vostres huelhs ab mal ranc 27 qar per vos son estrait caval e marc, 43 que per vos son estranh caval e marc 28 amor tolez q’a pauc del tot non tomba; 44 c’amors torbatz c’ap pauc del tot non comba 29 confunda·us Deus qe ja non sapchatz com 45 cofonda·us Dieu e ja no veyatz com 30 q’als druz vos faitz mal dir e vil tener: 46 vos faytz als drutz mal dir e vil tener 31 malastres es q’eu ten desconoissenz 47 mal estares qu’ieu ten desconoissens 32 qar pejor etz qan hom vos amonesta. 48 que pejors es qui pus vos amonesta.

Ms. S Ms. Sg 41 Fals losengiers foc las lenguas vos arja 33 Fals lausengiers focs las lenguas vos arga 42 et qe fosaz tuit ferit de mal cranc 34 e perdatz totz los oilhs de mal cranc 43 qar per vos son estrat cavals et marc 35 car per vos son estragh caval e marc 44 amor tolez c’a pauc de joi non tomba 36 c’amors toles c’a pauc del tot nos plumba 45 confonda·us Deus car ges no sabez com 37 confonde·us Deus e non vires ja con 46 vos faiz als druz mal dir et vil tener 38 vos fais mal dir als drut e vil tener, 47 malsastres es qe·us teng desconoiscenz 39 malsastres es qui·os ten desconoissens 48 qar pejors es com plus vos amonesta. 40 peiors es on hom vos monesta.

Ms. c Ms. f 41 Fals lausengiers focs la lengua vos argua […] 42 o qe perdaz los oils ab un mal tang 43 qe per vos son estrait caval et marc 44 amor tollez q’a pauc de tot non tomba 45 confonda·us Dieus qe ja non sabes com 46 vos faiz als druz mal dir e vil tener 47 malastres es qe·us ten desconoiscens 48 qe pegier es qant hom vos amonesta.

166

Rather than occurring consistently throughout the manuscripts, the diphthong form fuocs occurs in only eight manuscripts, while ten other manuscripts have a monophthongal focs and one manuscript omits the half line in which the word occurs.

Similarly, oillz occurs without the diphthong in twelve manuscripts but as huells, with a diphthong, in five of the manuscripts.4 Pejor occurs without the diphthong in eighteen manuscripts and only once with a diphthong, in Ms. C. Lest it seem that the version in the

COM simply uses the variant occurring in the majority of the manuscripts, Deus occurs as Dieus, with a diphthong, in eleven manuscripts and without the diphthong in seven manuscripts,5 but the COM does not use the diphthong.

Because of this variation, the texts in the COM cannot be taken as a simple representation of the patterns used in the texts, but are instead better considered as simply one variant of that text. In fact, the text given in the COM is very similar to that of Ms. I and Ms. K. On the other hand, we do have the side-by-side texts of the different manuscript versions of the same texts. De Conca’s thesis gives the side-by-side texts of all nineteen poems of Arnaut Daniel, and the pilot project of the proposed fourth part of the COM includes such side-by-side texts for ten other poems. Comparing these manuscript variants with the text given in the COM, we can develop a modified statistical method that allows a meaningful quantitative comparison of the pattern of the use of the glide-initial diphthongs by text type. This method is discussed below in Section 4.6.

Because the problem of multiple manuscripts is most severe for the lyric poetry, the text type analysis is the only parameter greatly affected. The analyses of date and dialect use

4 The word is not found in Ms. P and Ms. S, where the phrase including oils is replaced by fossaz tuit ferit. 5 The word is omitted in Ms. V. 167

only the non-lyric poetry and prose texts that can be securely dated and located. The vast majority of these occur in only one manuscript, so the modified statistical method is thus not necessary for these analyses. It is necessary, however, for the analysis by text type, because many of the lyric poems occur in multiple manuscripts.

4.3 Glide-Initial Diphthongs in the Concordance de l’Occitan Médiéval

4.3.1 Methodology

In order to explore the pattern of attestation of the glide-initial diphthongs, I used a method very similar to that discussed in Chapter Three for adjective derivation. The

COM was searched exhaustively for word forms containing the diphthongs in question.

The strings -ie-, -je-, -ye- were used to find examples of the front diphthong, and -ue-,

-uo-, -wo-, -we- were used to find the back diphthong.

The resulting list of words included a large number of words which did include the sounds in question, pronounced as a glide-initial diphthong, but which did not develop from the same source as the glide-initial diphthongs introduced in Section 4.1.

These words were excluded from the list of diphthong forms of interest. Some of these words were inherited from Latin with this sequence of sounds, such as pieta ‘innocent’ from Latin pietas. In addition, the loss of voiced intervocalic stop consonants introduced many vowel combinations in . If one of those vowels was a high vowel, it lost its syllabicity and attached to the neighboring vowel as a glide to form a diphthong. From

Latin fidelis, for instance, the loss of intervocalic /d/ resulted in the Old Occitan fiel

168

‘faithful’, presumably through a stage in which the vowels /i/ and /e/ were in hiatus before the /i/ became a glide. It is important to note that diphthongs formed in this way were, for speakers, the “same sound” as the results of the diphthongization discussed in this chapter. In the rhyme scheme of the troubadour poetry, fiel, for instance, frequently with ciel ‘sky’ and other words with newly developed diphthongs. In addition, words which developed [je] in both ways are spelled with the same range of variants.

Words which developed the sounds from sources other than the diphthongization of the mid vowels were excluded from the analysis of the glide-initial diphthongs.

In addition, verbs were removed from the list. The decision to remove verbs from the list resulted from the problem of defining what counts as the “same word”. Verbs, to a much greater extent than nouns, have different stems in different parts of the conjugation.

The verb endings also overlap to a large extent with other derivational endings. -an, for example, is the third-person plural ending in various tenses, but is also an adjective- and noun-deriving suffix. Because of this, it was not possible to find and include all forms of the verb in this analysis without including many forms of other words. The verbs were therefore removed from the list and not included in the analysis which follows. In addition, forms which were identical to verbs had to be excluded for a similar reason. The word deu ‘god’ occurs with a diphthong (e.g. dieu) hundreds of times in the COM, but it could not be analyzed quantitatively here because the monophthongal variant, deu, is identical to the third person singular present tense form of dever ‘to owe; should’. Any count of forms of would include many tokens of the verb rather than the noun, but the verb, to my knowledge, never appears with the diphthong. Many other nouns were

169

excluded for the same reason: trolh ‘pressure’ is identical to forms of trolher ‘to put pressure on’, pueg ‘hill’ is identical to forms of pogar ‘to climb’, brolh ‘grove’ is identical to forms of brolher ‘to push’, and so on. Some forms have even more problems: voig ‘empty’ occurs both with and without diphthongs, but the forms without diphthongs are identical to forms of votz ‘voice’ as well as to forms of the verbs vojar ‘to empty’ and vogar ‘to sail’. Though these words were excluded from this analysis, the verbs, and words identical with verb forms, are indeed forms of interest and should be analyzed in a further study.

Another group of words eliminated were those in which a back diphthong developed following a velar [g] () or [k] (). In the Old Occitan texts, the letters

and , when followed by a , often had a inserted between the consonant and the vowel that served to mark the consonant as velar rather than palatalized. Though this usage was initially used only with front vowels, the use of as part of a digraph or, more commonly, occurs before all vowels in many

Old Occitan texts (Grafström 1958). This usage made it very difficult to determine when a spelling variant such as or represents a diphthong and when it is a digraph representing the velar followed by a simple vowel. For example, in a spelling such as orguolh ‘pride’, it is not at all certain whether the vowel is a diphthong or not, though we know that orguelh did frequently occur in diphthongized forms. To avoid confusion and an inaccurate quantitative analysis, all words in which the back diphthong [wo]/[we] is preceded by a velar were removed from the list and are not included in the analysis which

170

follows. Like the verbs discussed above, they provide fertile ground for future studies of the variation. Front diphthongs following a velar, however, are included.

After the list of word forms including the diphthongs of interest was thus reduced in these ways, a list of lemmas and variants was created, as was done with the derived adjectives in Chapter Three. Again, spelling variations, punctuation variants, and case forms were collapsed into a single list of variants for each lemma. Each derived form was considered a different word, so that there are often groups of clearly related words such as viell, viellart, and viellmen. That is, I consider the occurrence of diphthongs in various words, rather than different morphemes or roots. This decision was made because of the difficulties posed by spelling variants and stem changes which a morpheme or root undergoes when it occurs in different words formed by suffixation and compounding.

Because of this, and of the fact that the COM is not a tagged corpus, searching for all cases of a root or morpheme would be extremely time-consuming and would almost certainly not be able to be accomplished accurately. Using words instead allows for a more manageable and accurate quantitative analysis. In all, the lists contained variants of eighty-three words in which the front diphthong occurs and fifty-one words in which the back diphthong occurs. An example list of diphthong variants of a single lemma, uelh

‘eye,’ which occurs with the diphthong in the COM text in (4), is given in (5).

171

(5) ueil uel ueyl uoilhs vostr’hueilh/ uoylll uols ‘uolh uolhs uolh uol uoill/ uoill uogll uoglls uogllz ‘ueyll ueylls ueyls ‘uel uiels uiel uelh ‘uelh uelhs ‘uelhs ‘uelhs) ‘uell uell uells ‘uells ‘uelly ‘uellz uels ‘uels uelz ‘ueil ueilh ‘ueilh ‘ueill ueill ueills ‘ueills ueils ueiltz ‘ueilz ueilz c’uoill hueylhs ‘uehllz uehls l’uoill/6 l’uoill L’uoill l’uogll l’ueil l’uelh L’uelh l’uelhs L’uell l’ueilh l’ueilhs l’ueill l’huoill l’hueill l’huelh huoil huoill/ huoill huoills huolz huols huolhs @Huoills7 Huoills huoils huoilz hueyll hueyllz hueyls hueylz c’uoills huelg huelh huelh/ Huelh huelhs Huelhs huelhz hueli huell huelli huells huelltz huelly huellz huels Huels huelx huelz huelhs ‘huelhs ‘huelh huel ‘huel hueil hueilhs Hueill hueills hueilltz hueils Hueils Hueilhs hueill hueill/ hueilh d’uells d’uelhs d’ueilhs d’uelh vostr’ueilh vostr’uelh

While the list of words in which this diphthong developed is of interest, the pattern with which they occur is more interesting for the purposes of this thesis. In order to explore this pattern, a corresponding list of variants in which these same lemmas occur without diphthongs was developed. Thus, parallel to the list of diphthong variants in (5) is the list of monophthong variants of the same lemma, given in (6).

(6) oil Oil oilh l’oyll vostr’oill qu’olh l’oill l’olh l’oils l’oil d’olhs d’olh hoilz d’oilz d’oils d’oillz d’oill hoyl hoils hoil ollz oyls oylls oyll oyl olz ols olls oll ol olhz olhs Olh olh oill o·ill @Oill oills oilltz oillz Oillz oils Oils oiltz oilz Oilz

The frequency of the diphthong forms in (5) could thus be considered as a percentage of the total number of tokens of the lemma, both with and without diphthongs. Taking a proportion, rather than a simple count, gives a much more accurate picture of the pattern of variation.

6 The l’ in l’uoill and other forms is the definite . 7 @ is the way in which the COM program marks capitalization. 172

Like the derived adjectives discussed in Chapter Three, the glide-initial diphthongs were too numerous to analyze accurately by hand, so a computer program was used to find all variants of each lemma, and the frequency of each variant, within each group of texts to be analyzed: each time period, each dialect, and each text type. The program then created a list of the lemmas that are found in each group of texts and calculated the number of lemmas and the number of total attestations of each lemma for the group of texts. The computer program did not identify variants of each lemma, but used the hand-prepared list of variants to find the lemma frequency.

In addition, for words with back diphthongs such as uelh, a separate set of variant lists were created, with the [wo] forms in one list and the fronted [we] forms in a separate list. This allows the assertion that the development of one or the other is a function of the dialect, as introduced above in Section 4.2, to be investigated.

4.3.2 Results

In the entire COM corpus, the eighty-three words in which the front diphthong developed appeared 104,417 times. The diphthong [je] occurred in 30,369 of these tokens or 29.1%, while the monophthong [e] occurred in 74,038 tokens or 70.9%. The fifty-one word analyzed in which the back diphthong developed appeared a total of 64,568 times;

16,187 of these tokens, or 20.1%, occurred with the diphthong [wo]/[we], and 48,381 tokens, or 74.9%, occurred with the monophthong [o]. Taken together, 167,975 tokens of

134 words were analyzed, 46,556, or 27.7%, of them occurring with diphthongs and

173

121,419, or 72.3%, of them occurring with monophthongs. This breakdown is shown in

Table 4.1.

Table 4.1 Total Diphthong and Monophthong Forms in the Entire COM Corpus

Diphthongs Monophthongs Total Tokens Front Diphthongs 30,369 74,038 104,407 [e] ~ [je] 29.1% 70.9% Back Diphthongs 16,128 48,381 64,509 [o] ~ [wo]/[we] 25.0% 75.0% Total Words analyzed 46,506 121,419 167,925 27.7% 72.3%

The front diphthongs occur more frequently than the back diphthongs by 4%, which, with so many tokens, is highly significant. A chi-square test for the independence of the two sets of diphthongs shows high statistical significance (χ2 = 322.38, p < 0.0001). If the two sets of diphthongs are independent, it is possible that their use might be influenced by different parameters. Because of this, the front and back diphthongs will be analyzed separately in the rest of the chapter.

The glide-initial diphthongs appear in approximately a quarter of the attestations of these words. The remainder of the chapter will consider whether the date, dialect, and text type of the text influence the speaker’s choice of whether to use a diphthong variant or a monophthong variant.

4.3.3 [wo] and [we]

174

One other issue of interest is the use of the two variants of the back diphthong.

There are two diphthongs that develop from the mid , the expected [wo] and a fronted variant [we]. The difference between the two has long been claimed to be at least partially a dialectal feature, in which the fronted variant [we] occurs alongside [wo] in the

Languedoc and Provençal dialects, but [wo] is preferred in the northern and western dialects.

In the COM corpus, of the 16,128 tokens in which the back diphthong occurs,

5,273, or 32.7%, are instances of the [wo] variant while the other 10,855, or 67.3%, are instances of the fronted [we] variant. In addition to considering how the date, dialect, and text type affect the use of diphthong and monophthong variants, I also consider how they relate to the choice of the [wo] or [we] variant of the back diphthong. Because this is considered to be a dialect difference, I turn first to the analysis of the texts by dialect in order to confirm this claim.

4.4 Analysis of Glide-Initial Diphthongs by Dialect of Text

In this section, I consider the use of diphthong and monophthong variants by the dialect of the text. As in the analyses of the preceding chapters, this analysis is based on a division of seven dialects in Section 1.7.2, with only non-lyric poetry and prose texts with moderately secure locations included in the analysis. In each group of texts, the lemmatized lists were used to find all variants of the words in question, both with and without diphthongs. The results for the combined diphthongs, both front and back, is given in Table 4.2.

175

Table 4.2 Front and Back Diphthongs by Dialect of Text

Diphthongs Monophthongs Total 187 852 1039 Alpine Provençal 18.0% 82.0% 43 375 418 Auvernhat 10.3% 89.7% 1,888 8,406 10,294 Gascon 18.3% 81.7% 4,121 10,708 14,829 Languedoc 27.8% 72.2% 19 76 95 Limousin 20.0% 80.0% 2,431 4,226 6,657 Provençal 36.5% 63.5% 712 6,362 7,074 Waldensian 10.1% 89.9%

While some of the dialects do seem to pattern similarly, it is immediately clear that there are differences in the usage of diphthongs among the dialects. The Provençal dialect uses the largest proportion of diphthongs, while the Waldensian and Auvernhat dialects use glide-initial diphthongs less than half as often. When we consider only the front diphthongs, these differences are even greater, as shown in Table 4.3 and Figure 4.2.

176

Table 4.3 Front Diphthongs by Dialect of Text

Diphthongs Monophthongs Total 47 575 622 Alpine Provençal 7.56% 92.44% 19 257 276 Auvernhat 6.88% 93.12% 1,532 4,113 5,645 Gascon 27.14% 72.86% 2,715 5,001 7,716 Languedoc 35.19% 64.81% 6 47 53 Limousin 11.32% 88.68% 1,481 2,899 4,380 Provençal 33.81% 66.19% 12 3,818 3,830 Waldensian 0.31% 99.69%

100% 90% 80% 70% 60% 50% 40% Monophthong 30% 20% Diphthong 10% 0%

Figure 4.2 Percentages of Front Diphthong Variants by Dialect of Text

The differences in the pattern of usage of the front diphthong are highly statistically significant (χ2= 1972.1; p < 0.0001). The most immediately obvious difference between 177

the dialects is that there are almost no front diphthongs in the Waldensian texts at all.

Only twelve tokens of the front diphthong occur in this dialect, 11 variants of nier ‘black’ and one token of rienc ‘kingdom’. These occur in four texts: Le Bestiaire vaudois, the

Nouveau Testament vaudois de Zurich, Vergier de cunsollacion e altri scritti, and in one of the Trois Sermons vaudois sur le Jugement Dernie. Otherwise, the front diphthong is not found in the Waldensian dialect texts. The patterns found in the other dialects cluster into precisely the dialect groups suggested by Bec (1986): the Alpine Provençal,

Auvernhat, and Limousin dialects pattern together; Languedoc and Provençal pattern together, and Gascon is different from all the others.

The Alpine Provençal, Auvernhat, and Limousin dialects, which Bec (1986) calls

Northern Occitan, use diphthongized variants of the front vowels in these words approximately ten percent of the time. While this is more than the almost non-existent occurrences in the Waldensian dialect, it is much less than in the remaining dialects. A separate chi-squared test for independence on the outcomes in these three dialects shows that the differences between them are not at all significant (χ2 = 1.251, p = 0.5349), though the difference between them and the other dialects is indeed significant

(χ2 = 1971.7; p < 0.0001). In texts from the Alpine Provençal, Auvernhat, and Limousin dialect areas the front diphthong is used in an average of 12% of the lemmas in each text.

In the Limousin dialect, this average is driven by a single text, the Dictionnaire provençal-latin, in which the front diphthong occurs in 30% of the lemmas and 20% of the tokens of the words analyzed here. The front diphthong is not attested in the other texts from the Limousin dialect area. In the Alpine Provençal and Auvernhat dialects

178

areas, however, the distributions within each text are somewhat more consistent, rather than being driven by any single text.

Similarly, the Languedoc dialect and the Provençal dialect, which Bec (1986) calls Meridional Occitan, pattern together, with the diphthongs appearing approximately one third of the time. The difference between the patterns of usage in these two dialects is not significant (χ2 = 2.267; p = 0.1321). Though there are eight texts from the Languedoc dialect area in which no diphthongs occur, most of these texts are very short and include very few of the words analyzed here. In the other 51 texts from the Languedoc dialect area, the diphthongs occur in an average of 38.6% of the tokens in each text. Though the use of the diphthongs varies somewhat between each text, in the majority of the texts from the Languedoc dialect area it is significantly higher than the use of glide-initial diphthongs in the texts from the Northern Occitan dialects and is fairly consistent in a large number of texts rather than being driven by a single text or a small group of texts.

The texts from the Provençal dialect area pattern similarly.

The Gascon dialect shows a pattern that falls between those of the other dialects, with diphthongs appearing more frequently than the Alpine Provençal, Auvernhat,

Limousin, and Waldensian dialects, but less frequently than in texts from the Languedoc and Provençal dialect areas. It is surprising to find such a high proportion of diphthong usage in this dialect, because Grandgent (1905) notes that diphthongization is least common in the southwest, which falls into the Gascon dialect region. Where we would expect to find the smallest proportion of diphthongs, based on Grandgent’s description, we instead find one of the highest proportions of diphthong usage, with diphthongs

179

appearing in over a quarter of the tokens. In addition, texts from the Gascon dialect area show more variation between texts than the other dialect areas. Setting aside the shorter texts, where the presence or absence of diphthongs may be a sampling issue, texts from the Gascon dialect vary from 0.4% front diphthong variant tokens (two diphthong variants and 457 monophthong variants) in Documents du Monastère de Santa Clara to

94.5% front diphthong variant tokens (378 diphthong variants and 22 monophthong variants) in Le Censier gothique de Soule, without a strong clustering of texts around any proportion as is found in the other dialect areas. The overall distribution of diphthong and monophthong variants in texts from the Gascon dialect area is therefore somewhat misleading, as the texts vary so much, but it is clear that the usage of the diphthongs in the Gascon dialect differs in important ways from that of the other dialect areas.

The individual lemmas in which the diphthongs occur do not seem to differ in important ways between the dialects, with the exception of the Waldensian dialect area.

Texts from the Waldensian dialect are unusual in that many words that occur frequently with the diphthong variants, such as mieg ‘half’, are never found with diphthong variants in the Waldensian texts, though 89 tokens of meg without the diphthong occur in these texts. In texts from the other dialects, the same words are generally found with diphthongs.

When we turn to the back diphthongs, however, we find an entirely different picture, as shown in Table 4.4 and Figure 4.3.

180

Table 4.4 Back Diphthongs by Dialect of Text

Diphthongs Monophthongs Total 140 277 417 Alpine Provençal 33.57% 66.43% 24 118 142 Auvernhat 16.90% 83.10% 356 4,293 4,649 Gascon 7.66% 92.34% 1,406 5,707 7,113 Languedoc 19.77% 80.23% 13 29 42 Limousin 30.95% 69.05% 950 1,327 2,277 Provençal 41.72% 58.28% 700 2,544 3,244 Waldensian 21.58% 78.42%

100% 90% 80% 70% 60% 50% 40% Monophthong 30% 20% Diphthong 10% 0%

Figure 4.3 Percentages of Back Diphthong Variants by Dialect of Text

181

While the dialect difference is again highly statistically significant (χ2 = 1168.4, p < 0.0001), the differences are not the same as those discussed above for the front diphthong. While the Waldensian dialect shows almost no instances of the front diphthong, texts from this dialect area have back diphthongs occurring in over 20% of the tokens. Rather than having by far the lowest proportion of uses of the diphthong, the

Waldensian dialect is in the middle of the proportions of the back diphthong. It is important to note, however, that distribution of diphthong and monophthong variants in the Waldensian dialect does differ from that of the other dialects. While the back diphthong variants are found in many words in the other dialects, all 700 tokens of back diphthongs are in four words: buou ‘ox’, fuoc ‘fire’, encuoi ‘today’, and luoc ‘place’. No other words occur in the Waldensian texts with diphthongs. Even more interesting, only one of these words, encuoi, shows the alternation between diphthong and monophthong variants that is found for most other words in most groups of texts. The other three words occur almost categorically with diphthongs in the Waldensian texts, with the diphthong variants occurring in all 28 tokens of buou, 253 of 254 tokens of fuoc, and all 413 tokens of luoc. Thus, though the overall proportion of diphthong and monophthong variants within the Waldensian texts does not set this dialect apart, a closer look at the pattern of use of these variants shows that the dialect is quite different indeed: rather than variants with diphthongs being spread throughout the lexical items analyzed and occurring side by side with monophthong variants of the same words, the Waldensian texts include diphthong variants in only four words and lack most of the variation within words that is found in the other dialects.

182

Though that variation is found in the Gascon dialect, texts from this dialect have the lowest percentage of occurrence of the back diphthong, occurring only 7.7% of the time. This is what we might have expected based on Grandgent’s (1905) description, but we did not find it in the front diphthongs discussed above. Grandgent’s statements about the relatively paucity of diphthongization in the southwest is best taken as referring only to the back diphthongs.

The back diphthongs are most common in the Provençal dialect, followed by the

Alpine Provençal and Limousin dialects, though the Limousin dialect has relatively few tokens and there may be some sampling error because of that. In the back diphthongs, then, while the dialect of a text does clearly affect the proportion of diphthongs used, it affects the front and back diphthongs in different ways. While the front diphthongs are most common in the central dialects of Languedoc and Provençal, the back diphthongs are most common in the eastern dialects of Provençal and Alpine Provençal. This may suggest a different center from which each change spread. If this is the case, the “Big

Bang” approach presented at the end of Section 4.1 would need to be modified to account for separate sound changes.

Before leaving the issue of dialect, I consider the distribution of the two outcomes of the back diphthong, [wo] and [we]. Based on the description of these outcomes, we would expect to find a high percentage of [wo] in the Alpine Provençal, Auvernhat, and

Gascon dialects with a mixture of the two variants in Languedoc and Provençal. This is not at all what occurs in the texts, however, as shown in Table 4.5.

183

Table 4.5 Number and Percentage of [wo] and [we] Variants by Dialect of Text

[wo] [we] Total 65 75 140 Alpine Provençal 46.4% 53.6% 5 19 24 Auvernhat 20.8% 79.2% 92 266 358 Gascon 25.7% 74.3% 435 972 1,407 Languedoc 30.9% 69.1% 5 8 13 Limousin 38.5% 61.5% 528 390 918 Provençal 57.5% 42.5% 694 4 698 Waldensian 99.4% 0.6%

While there are very clear and highly significant differences in the distribution by dialect

(χ2 = 1000.3, p < 0.0001), they are not quite the differences we expect based on the description of the [wo] and [we] variants in the literature introduced in Section 4.1. The

Languedoc and Provençal dialects show a mixture of forms, but that is one of the only expected results. As in the case of the front diphthongs discussed above as well as the adjective derivation discussed in Chapter Three, the Waldensian dialect is the most different from the others, with an overwhelming majority of [wo] forms and almost no fronted [we] forms. Of the other dialects, however, only Provençal has more [wo] forms than [we] forms. For all of the other dialects, it is the fronted [we] forms that predominate, though Anglade (1977) stated that for all of these dialects except

Languedoc, the [wo] form was preferred. In the data presented in Table 4.5, however, it seems that the fronted [we] form was preferred, to a greater or lesser extent, in all of the 184

dialects except Waldensian and Provençal. While the small number of texts and tokens in the Auvernhat and Limousin dialects may be the cause of this unexpected outcome, this explanation does not hold for the Gascon and Alpine Provençal dialects, which each have enough texts and tokens of diphthongs to analyze quantitatively with some confidence.

Even if we remove the Waldensian results from the statistical analysis, as they are by far the most different from the other dialects, the pattern of [wo] and [we] variants by dialect is still statistically significant, though somewhat less strongly so (χ2 = 202.9, p < 0.0001). It is clear from this brief analysis that the pattern of [wo] and [we] variants is affected by the dialect of a text. The effect of the dialect is clearest in the Waldensian dialect, but is also present in the other dialects. The effect is not, however, what we would expect from the grammars and descriptions of Old Occitan; instead it shows a mixture of both forms throughout all of the dialect areas except Waldensian.

4.5 Analysis of Glide-Initial Diphthongs by Date of Text

In this section, I consider the use of diphthong and monophthong variants by the date of the text. I use the same methodology as that described in the previous chapters: the texts were analyzed in groups of fifty years and only non-lyric poetry and prose texts with moderately secure dates are included in this analysis, as discussed in Section 1.7.1.

In each period, all tokens of all variants of the 134 words were analyzed, as shown in

Table 4.6.

185

Table 4.6 Front and Back Diphthongs by Date of Text

Diphthongs Monophthongs Total 11th c 1 192 193 0.5% 99.5% 1100-1150 57 581 638 8.9% 91.1% 1150-1200 67 681 748 9.0% 91.0% 1200-1250 1,390 5,948 7,338 18.9% 81.1% 1250-1300 3,094 15,530 18,624 16.6% 83.4% 1300-1350 9,394 18,166 27,560 34.1% 65.9% 1350-1400 5,968 11,621 17,589 33.9% 66.1% 1400-1450 5,025 9,145 14,170 35.5% 64.5% 1450-1500 3,444 10,015 13,459 25.6% 74.4%

It is immediately clear that there is a rise in the proportion of diphthongs used throughout the Old Occitan time period. This is confirmed when we consider only the front diphthongs, as shown in Table 4.7.

186

Table 4.7 Front Diphthongs by Date of Text

Diphthongs Monophthongs Total 11th c 0 119 119 0% 100.0% 1100-1150 49 543 592 8.3% 91.7% 1150-1200 47 484 531 8.9% 91.1% 1200-1250 1,008 3,515 4,523 22.3% 77.7% 1250-1300 2,258 10,583 12,841 17.6% 82.4% 1300-1350 5,817 10,033 15,850 36.7% 63.3% 1350-1400 3,750 6,444 10,194 36.8% 63.2% 1400-1450 3,036 5,405 8,441 36.0% 64.0% 1450-1500 2,106 5,686 7,792 27.0% 73.0%

The front diphthongs are not found in the earliest group of texts. Though it is possible that this is simply because there are so few texts and tokens from this time period, it is unlikely. The front diphthongs first appear in the 1100-1150 time period and increase in usage until 1350, when the distribution levels off at about 36%. The increase in the use of diphthongs through the Old Occitan period is highly statistically significant (using a chi-squared test for trend, χ2 = 607.91, p < 0.0001).

The words for which diphthong variants are used in the early time periods reinforce the “Big Bang” approach described in Section 4.1. In texts dated between 1100 and 1200, the words that occur with diphthong variants are primarily those which were

187

part of the “Big Bang”, such as sieu ‘his/her’, mielhs ‘best’, mieg ‘half’, and ieu ‘I’. Other words which occur with diphthong variants do so very rarely and are morphologically related to words in the core of regularity, such as sieisanta ‘sixty’ by analogy with sieis

‘six’. In the later time periods, the development of diphthongs in other words by analogy becomes apparent, as diphthongs occur in multi-syllabic words like perfiechament

‘perfectly’ and before sonorants as in bien ‘good’ in texts dated between 1200 and 1250, while all tokens of these words in earlier texts were found with monophthong variants.

The drop in the proportion of diphthong variants used in the latest time period, while striking, is probably caused by the Waldensian dialect texts. As discussed in

Section 3.5, texts from the Waldensian dialect region are only found in the later time periods. In the previous section, we saw that almost no instances of the front diphthong were found in texts from the Waldensian dialect area. It is the prominence of these texts in the 1450-1500 time period, particularly the long Nouveau Testament vaudois de

Zurich, that causes the proportion of diphthongs used to be lower. If we exclude the

Waldensian texts from this time period, the proportion of diphthongs used is more in line with the time periods from 1300 to 1450, though still slightly lower. Aside from the

Waldensian texts and one other outlier, the Comptes consulaires de Montréal en

Condomois, the distribution of diphthong and monophthong variants is fairly consistent in the 1400-1450 texts, and in a very similar distribution as those in the preceding century and a half.

Unlike the analysis by dialect in the previous section, the trends found in the distribution of the diphthong and monophthong variants of the back diphthong are very

188

similar to those for the front diphthong. The distribution of the diphthong and monophthong variants of the back diphthong by the date of the text is shown in Table 4.8.

Table 4.8 Back Diphthongs by Date of Text

Diphthongs Monophthongs Total 11th c 1 73 74 1.4% 98.6% 1100-1150 8 37 45 17.8% 82.2% 1150-1200 20 197 217 9.2% 90.8% 1200-1250 382 2,433 2,815 13.6% 86.4% 1250-1300 836 4,947 5,783 14.5% 85.5% 1300-1350 3,577 8,133 11,710 30.5% 69.5% 1350-1400 2,218 5,177 7,395 30.0% 70.0% 1400-1450 1,989 3,640 5,629 35.3% 64.7% 1450-1500 1,338 4,329 5,667 23.6% 76.4%

Unlike the front diphthong, the back diphthong does occur in the earliest texts, but only rarely. The proportion seems to jump in the 1100-1150 time period, but this is simply because there are so few tokens. The proportion of back diphthongs used increases fairly steadily until it levels off during the 1300-1350 period, just like the front diphthong. The correlation of the date and the proportion of diphthongs used is highly statistically significant (using a chi-squared test for trend, χ2 = 344.52, p < 0.0001).

189

As in the case of the front diphthongs discussed above, the diphthong variants found in the earliest texts tend to be those from the “Big Bang”: for example, uelh ‘eye’, which is the only diphthong found in the eleventh century texts, buou ‘ox’, fuoc ‘fire’, and luoc ‘place’. Words which developed diphthongs by analogy, such as bosc ‘forest’ and ueimais ‘from now on’, appear in time periods as early as 1150-1200 and increase in frequency in the later time periods.

The similarity of the front and back diphthongs is further highlighted in Figure

4.4, which shows the trends of both the front diphthong and the back diphthong.

100% 90% 80% 70% 60% 50% 40% Back Diphthong 30% Front Diphthong 20% 10% 0%

Figure 4.4 Percentages of Front and Back Diphthongs by Date of Text

Though there are some differences in specifics, the overall trend is consistent between the front and back diphthong. It is clear that the time period in which a text is written does affect the proportion of the diphthongs used, but only up to a point. After 1300, the date

190

of the text makes little or no difference and the use of diphthongs is much more consistent.

The date of the text does not, however, seem to affect the distribution of the [wo] and [we] variants in any principled way, as shown in Table 4.9.

Table 4.9 Number and Percentage of [wo] and [we] Variants by Date of Text

[wo] [we] Total 11th c 0 1 1 0% 100% 1100-1150 4 4 8 50.0% 50.0% 1150-1200 1 19 20 5.0% 95.0% 1200-1250 138 245 383 36.0% 64.0% 1250-1300 74 753 827 8.9% 91.1% 1300-1350 985 2,581 3,566 27.6% 72.4% 1350-1400 572 1,645 2,217 25.8% 74.2% 1400-1450 967 1,016 1,983 48.8% 51.2% 1450-1500 648 691 1,339 48.4% 51.6%

Although the distribution of the [wo] and [we] variants is not consistent throughout the

Old Occitan time period, neither is there any clear trend across the fifty-year periods.

Rather than a consistent change in one direction or the other, the distribution changes repeatedly, with the [wo] variants becoming more or less frequent in different time periods. Despite these differences and the widely varying distributions, it is interesting

191

that the time periods 1300-1350 and 1350-1400 have almost the same distribution of [wo] and [we] variants, and the distributions in the time periods 1400-1450 and 1450-1500 are even more similar to one another. It is important to note, however, that the fronted [we] variant is preferred in all of the time periods.

4.6 Analysis of Glide-Initial Diphthongs by Text Type

Finally, in this section, I consider the occurrence of the diphthong and monophthong variants of the glide-initial diphthongs by the type of text. I separated lyric poetry, non-lyric poetry, and prose based on Ricketts’s divisions of these texts into types for the tranches of the COM. Table 4.10 gives the number and percentage of diphthong and monophthong variants in the three types of text.

Table 4.10 Front and Back Diphthongs by Text Type

Diphthongs Monophthongs Total Lyric Poetry 8,654 20,526 29,180 29.7% 70.3% Non-lyric Poetry 14,059 30,467 44,526 31.6% 68.8% Prose 23,843 70,426 94,269 25.3% 74.7%

While the distribution of the diphthong and monophthong variants in each of the text types looks similar, the differences are actually quite significant because of the large number of tokens.

192

In order to analyze the distribution of the diphthong and monophthong variants among the text types quantitatively, a modified statistical method was used in place of the chi-squared test and Fisher’s exact test used elsewhere in this thesis. This was necessary because the lyric poems often occur in multiple manuscripts that vary widely, as discussed above in Section 4.2. For the non-lyric poetry and the prose, because most of the texts only appear in one manuscript, we can simply take the reading in the COM as representing the reading in the manuscript. For the lyric poetry, however, we only know what the distribution of diphthongs and monophthongs in the edition in the COM is, rather than the distribution in individual manuscripts. In order to model what the manuscripts may have looked like, a parameter was used to represent the probability that given a form in the COM, the form in that place in each manuscript is the same as the form in the COM, whether it is a diphthong variant or a monophthong variant. To avoid , we use a flat prior probability for this parameter, which I call the reliability parameter. An average of ten manuscripts is assumed for each token, based on the average number of manuscripts for each poem in a sample of poems investigated

In order to model what the manuscripts may have looked like, we would need to look at all possible tables and all possible values of the reliability parameter. The p-value for the test of independence is the integrated product of three conditional probabilities, as shown in (7). First, the probability that variables are independent given the table produced is found using a chi-squared test. Second, given a value of the reliability parameter and a particular table, such as Table 4.10, the probability that this table is

193

correct can be calculated using the binomial distribution. Finally, the prior probability for the reliability parameter is conservatively taken to be flat from 0 to 1.

(7) p = P(independence ǀ table) · P(table ǀ reliability) · P(reliability)

Finally, we integrate over the possible tables and the reliability parameter using a

Monte Carlo integration technique.8 Based on this modified statistical method, there are significant differences between each of the text types. That is, the pattern of diphthong usage in the lyric poetry is significantly different from that of the other text types

(p < 0.0001), the prose is significantly different from the poetry (p < 0.0001), and the non-lyric poetry is significantly different from the lyric poetry and the prose

(p = 0.00834). The three-way contrast between the text type patterns is also highly statistically significant (p < 0.0001). This means that the likelihood that all of the tokens in all three text types share one pattern is vanishingly small, even when we factor in the uncertainty of the tokens taken from lyric poetry. The non-lyric poetry uses the largest proportion of diphthongs, followed by the lyric poetry, and the prose texts use the smallest proportion of diphthong variants.

When we look at only the front diphthongs, the pattern is slightly different. The prose texts use a much smaller proportion of the front diphthongs, which is similar to the pattern found for the total diphthongs, but the difference in proportion between the prose and the other text types is much larger for the front diphthongs. The front diphthong

8 For details of the Monte Carlo integration, see Kalos and Wiltlock (2009). 194

variants are most frequent in the lyric poetry, followed closely by the non-lyric poetry, as shown in Table 4.11.

Table 4.11 Front Diphthongs by Text Type

Diphthongs Monophthongs Total Lyric Poetry 6,629 12,271 18,900 35.1% 64.9% Non-lyric Poetry 9,376 18,032 27,408 34.2% 65.8% Prose 14,364 43,735 58,099 24.7% 75.3%

The difference in the patterns of the front diphthong and monophthong variants for two of the text types is highly statistically significant using the modified statistical method described above: the lyric poetry is significantly different from the non-lyric poetry and the prose taken together (p < 0.0001), and the prose is significantly different from the poetry (p < 0.0001). The difference in proportion of diphthong and monophthong variants between the non-lyric poetry, on the one hand, and the lyric poetry and prose combined, on the other hand, is not significant (p = 0.0234); this is probably because the pattern of the non-lyric poetry is between that of the lyric poetry and the prose. Though the distribution found in the non-lyric poetry is much more similar to that of the lyric poetry than that of the prose, the much larger number of tokens in the prose balances this out.

The three-way contrast of the patterns among the text types is also highly statistically significant (p < 0.0001). Part of the reason for this smaller number of front diphthongs in the prose texts is the large number of Waldensian prose texts. As we saw in Section 4.4, 195

texts from the Waldensian dialect area used almost no front diphthongs. Most of the

Waldensian texts we have are prose texts, some of which are quite long, such as the

Nouveau Testament vaudois de Zurich. While the presence of these Waldensian texts does affect the distribution of diphthong and monophthong variants in the prose to extent, it does not account for the magnitude of the difference found in Table 4.11. If we remove the Waldensian prose texts from consideration entirely and instead compare the distribution of diphthong and monophthong variants in the lyric and non-lyric poetry to those of the prose texts that are not known to have been written in the Waldensian dialect area, the pattern discussed above remains. This comparison is shown in Table 4.12.

Table 4.12 Front Diphthongs by Text Type excluding Waldensian Prose Texts

Diphthongs Monophthongs Total Lyric Poetry 6,629 12,271 18,900 35.1% 64.9% Non-lyric Poetry 9,376 18,032 27,408 34.2% 65.8% Prose excluding 14,353 40,077 54,430 Waldensian 26.4% 73.6%

Even without the Waldensian prose texts, the results of the text type analysis are very strong. The difference in distribution of the monophthong and diphthong variants of the front diphthong between the lyric poetry and the other two text types is extremely significant (p < 0.0001), as is the difference between the prose and the poetry

(p < 0.0001). A comparison of the non-lyric poetry with the prose and the lyric poetry combined, however, is not significant (p = 0.0225), just as in the analysis of the front 196

diphthong that included the Waldensian texts. The three-way contrast of the patterns among the three text types is also significant (p < 0.0001). This shows that the association of the monophthong variant of the front diphthong with the prose texts is genuine and not simply an artifact of the difference between Waldensian and the other dialects.

Perhaps the relative lack of front diphthongs in the prose texts is due in part to the relationship with Latin. Some of the prose texts are translations from Latin, while others, such as many of the charters, have Latin included in the text and code-switch between

Latin and Old Occitan. Many of the charter and other administrative documents are written in a mix of Latin and Old Occitan, and these documents have very few front diphthongs in the Old Occitan parts of the texts. Prose texts with an even closer relationship to Latin are the glossaries, which gloss Old Occitan words with Latin glosses or vice versa. Because neither Classical Latin nor the found in these texts had the front diphthongs, it would not be surprising if the association with Latin reduced the proportion of diphthongs used, either subconsciously, in the same way that we know that modern English speakers shift styles when speaking to different audiences, or consciously on the part of the scribes writing or copying the manuscripts.

Though the distributions of diphthong and monophthong variants in the lyric poetry and non-lyric poetry are similar, they are still significantly different (p < 0.0001).

In addition to the difference in the overall number of diphthong and monophthong variants in each text type, there are important differences in which words are found with diphthong variants in the two types of poetry. In the lyric poetry, the vast majority of diphthong tokens are from words which are part of the “Big Bang” and, for many of these

197

words, such fieu ‘fire’, greu ‘heavy’, and mieu ‘my’, the proportion of diphthong variants in the lyric poetry is much higher than the proportion in the non-lyric poetry or the prose.

Though words which developed glide-initial diphthongs through analogy also occur with diphthongs in the lyric poetry, it is the words from the “Big Bang” that drive the lyric poetry to have the highest proportion of diphthong variants.

In most other words which occur with diphthongs, however, the highest proportion of diphthong variants occurs not in the lyric poetry but in the non-lyric poetry.

This difference in distribution is likely tied to the date of composition of the lyric poetry; the troubadours were active in the first half of the Old Occitan period, though many of the manuscripts of lyric poetry we have come from much later. If some of the avenues of analogy by which words outside the core of regularity developed diphthongs did not develop until later, it is unsurprising, though certainly important, that the lyric poetry includes a smaller proportion of diphthong variants in these words than in the words from the core of regularity. It is important to note, however, that the lyric poetry has a much higher proportion of diphthong variants than texts from the first half of the Old Occitan period but, like those texts, the front diphthong variants occur primarily in words from the core of regularity.

The back diphthongs, however, have an entirely different pattern. Rather than the largest percentage of diphthongs, the lyric poetry texts include the smallest proportion of back diphthong variants, significantly lower than the non-lyric poetry and prose, as shown in Table 4.13.

198

Table 4.13 Back Diphthongs by Text Type

Diphthongs Monophthongs Total L yric Poetry 2,025 9,255 11,280 18.0% 82.0% Non-lyric Poetry 4,683 12,435 17,118 27.4% 72.6% Prose 9,479 26,691 36,170 26.2% 73.8%

Table 4.13 shows a clear difference in the patterns of the back diphthong and monophthong variants in the each of the text types. Of these, the lyric poetry is significantly different from the non-lyric poetry and prose (p = 0.0064) and the prose is significantly different from the poetry (p = 0.0084). The difference between the non-lyric poetry and the other text types, however, is not significantly different (p = 0.0132), though it is close to the threshold used for significance in this thesis. The three-way contrast of the patterns among the three text types is also statistically significant

(p = 0.0002). It is interesting that in the back diphthongs, the non-lyric poetry texts pattern with the lyric texts. The difference between the pattern in only the non-lyric poetry and that in only the prose is still statistically significant (χ2 = 7.892, p = 0.0052), but much less so than the differences between all three text types.

It is more difficult to explain why the lyric poetry would be associated with the monophthong variants of the back diphthongs, given the poets’ well-known awareness of language, word play, and rhyme, but, based on the COM corpus, it is nonetheless true.

Perhaps it is linked to the differences in the distributions of the front and back diphthongs by dialect discussed in Section 4.4; Jensen (1995) suggests that the koine of the lyric 199

poetry was based on the Languedoc dialect, and the Languedoc dialect has one of the lower proportions of back diphthong variants. Or perhaps the back diphthongs were less salient to the lyric poets. While the proportion of front diphthong variants in the lyric poetry is significantly higher than the proportion in any of the time periods before 1300, this is not the case for the back diphthongs. The diphthong variants do account for a larger percentage of tokens in the lyric poetry than in these time periods, but the gap is not nearly as wide as in the case of the front diphthongs: it is less than 5% in the

1100-1150, 1200-1250, and 1250-1300 time periods. It is possible that the innovative diphthong variants of the front mid vowels became associated with the lyric poetry, accounting for the higher use of front diphthong variants, but the innovative variants of the back mid vowels did not have this association.

Differences in the pattern of the front and back diphthongs in the lyric poetry are also found when we look at the words in which the diphthongs occur. As in the case of the front diphthongs, the lyric poetry does include a larger proportion of diphthong variants in words from the core of regularity than in words which developed diphthongs by various avenues of analogy. These proportions of diphthong variants of words in the core of regularity, however, are not significantly larger than the proportions of diphthong variants in these words in the other text types. Therefore the lower proportion of diphthong variants in the lyric poetry than that of the non-lyric poetry and the prose can be more accurately explained as a lower proportion of diphthong variants in words which developed diphthongs by analogy. This is in contrast to the front diphthongs, where, though words which developed diphthongs by analogy were found with a lower

200

proportion of diphthong variants, the very high proportions of diphthong variants in words in the core of regularity drove the overall proportion of front diphthong variants up, past the proportion found in the non-lyric poetry.

Finally, the distribution of the [wo] and [we] variants of the back diphthong was considered, as shown in Table 4.14.

Table 4.14 Number and Percentage of [wo] and [we] Variants by Text Type

[wo] [we] Total L yric Poetry 671 1,328 1,999 33.6% 66.4% Non-lyric Poetry 755 3,896 4,651 16.2% 83.8% Prose 3,847 5,631 9,478 40.6% 59.4%

The [we] variant, though the more common variant in all three text types, accounts for a significantly larger proportion of the tokens in the non-lyric poetry than in the lyric poetry or the prose. The relatively low proportion of [we] variants in the prose, however, is partially an artifact of the dialect difference discussed above. The texts from the

Waldensian dialect area show an overwhelming proportion of [wo] variants, but this does not entirely account for the difference. If the Waldensian texts are removed from the analysis the prose texts still have a much larger proportion of [wo] variants than the other two text types.

201

4.7 Conclusion

As in the discussion of adjective derivation in Chapter Three, here again all three parameters considered influence the use of the monophthong and diphthong variants of the glide-initial diphthongs. Different dialects have different patterns of diphthong usage, particularly with the front diphthongs. The most striking difference is in the Waldensian dialect, where the front diphthong variants are almost entirely absent, but there are other differences as well. The dialect also affects the distribution of diphthong variants in the back diphthongs, but in very different ways. The presence of the diphthong variants of the back vowels is most strongly associated with the eastern dialect areas Provençal and

Alpine Provençal, and to a lesser extent Limousin and Waldensian, while texts in the

Languedoc dialect and especially the Gascon dialect use a much smaller proportion of diphthong variants. Though the overall proportion of diphthong variants in the

Waldensian dialect looks very similar to that of the Limousin dialect, the back vowels pattern fundamentally differently in the Waldensian dialect, occurring in only four lemmas. It is also important to note that though the distribution of the [wo] and [we] variants of the back diphthong do seem to be at least partially conditioned by the dialect area in which the text was written, the distribution found in Table 4.4 above is not at all what we would expect based on the grammars and descriptions of Old Occitan. It is clear that, in all of the dialects except Waldensian, a mixture of the [wo] and [we] variants is found, and that texts in most of the dialects use more of the [we] variants than the [wo] variants, contrary to the descriptions given.

202

Unsurprisingly, the date of the text also affects the extent to which diphthongs are found in it. Earlier texts have a very small percentage of diphthongs which increases until the early fourteenth century. After that date, the usage of diphthong and monophthong variants is fairly consistent through the end of the Old Occitan time period. Because the development of the glide-initial diphthongs was a fairly recent change during the old

Occitan period, it is not surprising that it might take time for the innovative variant to be widespread in the writing, though the back diphthong certainly does occur in the earliest

Old Occitan texts.

Finally, the type of text affects the distribution of diphthong and monophthong variants of the glide-initial diphthongs in more subtle but still interesting ways. For the front diphthongs, the use of the diphthong variants is associated with the poetry, both lyric and non-lyric, while the monophthong variants are more strongly associated with the prose, though it is important to note that the majority of tokens in all text types are monophthong variants.

For the back diphthongs, on the other hand, the associations are entirely different; the diphthong variants are associated with the non-lyric poetry and the prose, while the lyric poetry is more closely associated with the monophthong variants of the back diphthong. Though this association is less readily explained, it is still clear that the back diphthongs occur in a larger proportion of token in the non-lyric poetry and prose texts than in the lyric texts.

As in the previous two chapters, the type of text influences the usage of diphthong variants of both the front and back diphthongs, though in different ways. As in the

203

discussion of adjective derivation in Chapter Three, however, it is not the only feature that does so. Instead, the date, dialect, and text type all influence the proportion of diphthong variants. There are likely many other factors that influence the use of this linguistic feature as well, but the data from the COM show that the type of text does correlate with linguistic features and patterns.

204

Chapter 5: Conclusion

5.1 Summary and Discussion

This study considered the patterns in which the variants of three linguistic features occur: adjectival degree, especially as to the formation of the comparative, adjectival derivation, and the diphthongization of mid vowels, giving the glide-initial diphthongs.

From the study of these three features, two important conclusions can be drawn. The first is that the patterns of variation seem to be influenced by the type of text; the second is that the text type differences are not the only differences found in the texts.

In all three of the features studied here, the patterns of variation in the lyric poetry, non-lyric poetry, and prose were significantly different from one another. For the comparative adjectives, the text type was the only one of the three patterns considered that gave insight into the pattern of usage of the analytic and synthetic comparative adjectives, as the less frequent synthetic forms were strongly correlated with the lyric poetry. In the case of the adjective derivation, adjectives derived with -esc and names derived with -enc were used more frequently in the lyric poetry than in the non-lyric poetry or prose. The adjectives in -esc, like the synthetic comparative adjectives such as ausor, are less frequent forms in the language overall, and the respective preferences for these two types of words seem to be linked. In the lyric poetry, words that have a low 205

frequency in the full Old Occitan corpus are used more frequently than their overall frequency would suggest. This tendency is unsurprising given the troubadours’ role as highly educated entertainers and their well-known tendency for word play. These differences are primarily lexical differences, but interact with the patterns in the grammar of the language.

When considering the diphthongs, on the other hand, the distributions of the front and back diphthongs by text type diverged sharply from one another. The patterns of monophthong and diphthong usage among the text types were highly statistically significant, but these patterns were very different. The front diphthong [je] occurred at a similar rate in the lyric and non-lyric poetry (about 35%), but a much lower rate in the prose (about 25%). The back diphthong [wo]/[we], on the other hand, was used at a similar rate in the non-lyric poetry and the prose (26-27%), but a much smaller proportion of back diphthong variants was used in the lyric poetry (18%). It is interesting that the distributions of the front and back diphthong and monophthong variants among the text types are so different from one another, but this shows that the two diphthongizations had separate patterns of variation and spread from one another. Smith and Bergin (1984) suggest that the back mid vowels developed into diphthongs first and the front mid vowels developed into diphthongs later by analogy. Having a separate starting point or “Big Bang” for the front and back diphthong developments would help to explain the different patterns of variation and use, but it is also possible that the two sets of diphthongs had the same starting point but developed different patterns as they were adopted by later generations of speakers. In any case, it is clear that the pattern of

206

diphthong and monophthong variant usage is quite different among the text types. The front diphthongs are associated with poetry, both lyric and non-lyric, though they are used slightly more frequently in the lyric poetry than in the non-lyric poetry. The back diphthongs, on the other hand, are associated with the non-lyric poetry and the prose, though it might be more accurate to say that the lack of the back diphthong is associated with the lyric poetry.

The multiple features studied make it clear that text type differences exist not only in one component of the grammar; text type differences were found in the morpho-syntax, word derivation, and phonology. While it would be extreme to say that these text type differences in patterns represent fundamentally different systems, the differences found are significant in both depth and breadth. In addition, studies in other languages have found text type differences in syntax and morphology. For example,

Hock (2000) argues that some syntactic differences between the Rig-Vedic poetry and the later Vedic prose texts are better attributed to text type differences than to syntactic change. Similarly, Herring (2000) argues that the word order in Old Tamil texts is conditioned by the degree of “poeticality” of the text.

The second conclusion to be drawn, that text type is not the only factor that affects the patterns of variation, is not at all surprising but is still crucially important. In each of the case studies presented here, other factors influenced the use of the features in question. In the case of the comparative adjectives, while the date and dialect of the text did not have a clear correlation with the use of the analytic or synthetic forms, there were other factors, such as the individual style of the writer and the content of the text. The

207

latter case, which was demonstrated by the religious texts of the fifteenth century in

Section 2.5, would be a fruitful extension of this project. Considering the patterns of variation of these features by the content or purpose of the text would likely be an interesting question in its own right, but is different from the questions addressed here.

Before being able to do so, however, it is important to have established that there are patterns in the first place, as has been done here with the necessary first step of considering text type.

For the adjective derivation and the glide-initial diphthongs, the date and the dialect of a text also affect the distribution of the variants considered. Adjectives derived with -art and -esc decrease in frequency to some extent in the latter half of the Old

Occitan period. The glide-initial diphthongs, both front and back, increase in frequency steadily until the first half of the fourteenth century, when the frequency of usage levels out. The latter date-based difference likely represents the time necessary for an innovative variant to become fixed in the writing practices of speakers. The former difference, on the other hand, reflects changes in both the productivity of the affixes in question and the lexical decisions of speakers over time. These chronological changes interact with the text type differences in subtle ways. I would expect the chronological differences to also interact with the content of the texts, especially the prose texts, so that grammars, religious texts, and administrative texts might all pattern somewhat differently, but that is not addressed in this study; instead, I have focused here on the larger categories of text types.

208

The dialect region in which the texts were written has an important impact on the patterns of variation found in the texts. This is particularly true when considering the

Waldensian dialect, discussed below in Section 5.2, but is also true of the other dialects.

The front glide-initial diphthongs were most frequently used in the central dialects of

Languedoc and Provençal, less in Gascon, and even less in the northern dialects, while the back glide-initial diphthongs were more frequently used in the eastern dialects of

Provençal and Alpine Provençal. Like the chronological differences, these dialect differences interact with the text type differences and other parameters to create a complex web of associations and tendencies that Old Occitan writers were sensitive to.

5.2 The Waldensian dialect

The Waldensian dialect presents important differences from the other Old Occitan dialects. In two of the three case studies presented here, the language of the Waldensian texts differed dramatically from that of the other dialects. The vast majority of the texts that use the adjective-deriving suffix -ivol can be located in the Waldensian dialect area, and the fact that many of these adjectives occur only once and seem very transparent makes it likely that these are new derivations by means of a very productive derivational affix.

When considering the glide-initial diphthongs, the Waldensian texts avoid the front diphthong variants almost entirely, while using a proportion of the back diphthong variants that is approximately average among the dialects. This lack of the front

209

diphthongs, which are quite common in the other dialects, again marks the Waldensian dialect as quite different from the other surviving Old Occitan texts.

All of the surviving texts are important sources of data, so the Waldensian texts should not be excluded from research concerning the Old Occitan language and development. These texts, however, must be treated with care, and studies including these texts must be done with an awareness of how the fundamental differences between the

Waldensian dialect and the other Old Occitan dialects may affect the outcome. In

Sections 3.5 and 3.6, the difference between the Waldensian dialect and rest of the Old

Occitan corpus seemed to surface as a chronological trend as well as a text type difference. These seeming differences were simply an artifact of the accident of history that resulted in the survival of a Waldensian dialect corpus of mostly prose texts from the latter half of the Old Occitan period. Even when we account for the dialect difference, the date and text type of the texts are found to correlate with differences in the pattern of adjective derivation, but it is very important to remove the effect of the large differences in the dialect of the texts in order to have an accurate analysis. There are assuredly other groups of texts in the Old Occitan corpus and in the corpora of other languages that must be treated with care, but this study highlights the Waldensian dialect as one of these groups of texts.

210

5.3 The Composition of the Lyric Poetry

The lyric poetry of the troubadours poses special problems for analyses of Old

Occitan texts by date and dialect. The poems, written for entertainment by highly educated poets, are written in what has been described as a koine with a mix of features of different Old Occitan dialects. The Limousin dialect area was once considered to be the geographical source of the basis of the dialect mixture found in the lyric poetry, but more recent analyses focusing primarily on the phonology of the texts have pointed instead to the Languedoc dialect area as the probable source of the language used in the poetry. We can test this connection by comparing the patterns of variation found in the lyric poetry with those of the Languedoc dialect and other dialects as discussed in

Chapters Three and Four. The analysis of the comparative adjective formation presented in Chapter Two cannot shed any light on the possible connection between the lyric poetry and the dialects because the dialect of the texts was not found to play a role in the formation of comparative adjectives.

When the patterns found in the lyric poetry are compared with those found in the various dialect areas, there are some clear links with the Languedoc dialect, but the distribution of adjectives derived with each suffix discussed in Chapter Three is most similar to that found not in the Languedoc dialect but in the Gascon dialect, as shown in

Table 5.1.

211

Table 5.1 Adjective Lemmas and Tokens by Dialect of Text Compared with the Lyric Poetry

Lemmas Tokens -art -enc -esc -ivol Total -art -enc -esc -ivol Total Alpine 6 0 0 0 6 20 0 0 0 20 Provençal 100% 0% 0% 0% 100% 0% 0% 0% Auvernhat 2 0 0 0 2 2 0 0 0 2 100% 0% 0% 0% 100% 0% 0% 0% Gascon 27 10 25 0 62 132 32 73 0 237 43.5% 16.1% 40.3% 0% 55.7% 13.5% 30.8% 0% Languedoc 23 16 5 0 44 116 56 14 0 186 52.2% 36.3% 11.3% 0% 62.3% 30.1% 7.5% 0% Limousin 2 1 0 0 3 2 1 0 0 3 66.6% 33.3% 0% 0% 66.6% 33.3% 0% 0% Provençal 16 6 8 0 30 38 18 17 0 73 53.3% 20.0% 26.6% 0% 52.0% 24.6% 23.2% 0% Waldensian 6 42 0 76 124 17 141 0 572 730 4.8% 33.9% 0% 61.3% 2.3% 19.3% 0% 78.4% Lyric 18 13 17 0 48 84 25 49 0 158 Poetry 37.5% 27.1% 35.4% 0% 53.2% 15.8% 31.0% 0%

Aside from the higher proportion of words derived with -esc that is associated with the lyric poetry, the proportion of words derived with each suffix in the lyric poetry is strikingly similar1 to that found in the texts from the Gascon dialect area. Though this is striking, however, the patterns found in the Gascon and Languedoc dialects areas are quite similar to one another, and thus the pattern found in the lyric poetry is also similar to that of the Languedoc dialect only in its higher proportion of words derived with -esc.

In the names, however, the pattern of suffixes found in the lyric poetry is most similar to that of the Auvernhat dialect in the lemmas but the Provençal dialect in the tokens, as shown in Table 5.2.

1 A Fisher’s exact test shows that the distribution in the Gascon dialect and that of the lyric poetry texts is not significantly different (p = 0.4086). Comparing each dialect to the lyric poetry individually, the comparison of the Gascon dialect and the lyric poetry has by far the largest p-value. 212

Table 5.2 Name Lemmas and Tokens by Dialect of Text Compared with the Lyric Poetry

Lemmas Tokens -art -enc -esc Total -art -enc -esc Total Alpine 13 3 0 16 38 10 0 48 Provençal 81.3% 18.8% 0% 79.8% 20.8% 0% Auvernhat 5 3 0 8 12 6 0 18 62.5% 37.5% 0% 66.7% 33.3% 0% Gascon 50 19 2 71 542 68 12 622 70.4% 26.8% 2.8% 87.1% 10.9% 1.9% Languedoc 59 16 1 76 412 37 1 450 77.6% 21.1% 1.3% 91.6% 8.2% 0.2% Limousin 1 0 2 3 1 0 2 3 33.3% 0% 66.7% 33.3% 0% 66.7% Provençal 52 8 1 61 317 11 1 329 85.3% 13.1% 1.6% 96.4% 3.3% 0.3% Waldensian 1 27 0 28 75 71 0 146 3.6% 96.4% 0% 51.4% 48.6% 0% Lyric Poetry 13 6 1 20 219 7 1 227 65.0% 30.0% 5.0% 96.5% 3.1% 0.4%

The pattern of name creation found in the lyric poetry is not, however, significantly different from the pattern found in the Languedoc dialect for either the lemmas

(p = 0.2547) or the tokens (p = 0.03613). Based on the pattern of adjective derivation, the lyric poetry cannot be shown to be more closely connected with the Languedoc dialect than any of the other dialects, but neither are the patterns different enough to refute the claim that the lyric poetry is most closely connected with that dialect area.

When we turn to the glide-initial diphthongs, however, the distribution of diphthong and monophthong variants in the lyric poetry is extremely similar to that of the texts from the Languedoc dialect area, particularly for the front diphthongs. This can be clearly seen by comparing the row for the Languedoc dialect in Table 5.3 with that of the lyric poetry. 213

Table 5.3 Glide-Initial Diphthongs by Dialect of Text Compared with Lyric Poetry

Front Diphthong [je] Back Diphthong [wo]/[we] Diphthongs Monophthongs Total Diphthongs Monophthongs Total Alpine 47 575 622 140 277 417 Provençal 7.6% 92.4% 33.6% 66.4% Auvernhat 19 257 276 24 118 142 6.9% 93.1% 16.9% 83.1% Gascon 1,532 4,113 5,645 356 4293 4,649 27.1% 72.9% 7.7% 92.3% Languedoc 2,715 5,001 7,716 1,406 5,707 7,113 35.2% 64.8% 19.8% 80.2% Limousin 6 47 53 13 29 42 11.3% 88.7% 31.0% 69.1% Provençal 1,481 2,899 4,380 950 1,327 2,277 33.8% 66.2% 41.7% 58.3% Waldensian 12 3,818 3,830 700 2,544 3,244 0.3% 99.7% 21.6% 78.4% Lyric 6,629 12,271 18,900 2,025 9,255 11,280 Poetry 35.1% 64.9% 18.0% 82.0%

The percentage of front diphthong variants in the lyric poetry differs from that of

Languedoc dialect by only one tenth of one percent. Though the proportion of back diphthong variants is further from that found in the Languedoc dialect, differing by almost 2%, overall it is clear that the pattern of usage of the glide-initial diphthongs in the lyric poetry is far more similar to the distribution found in the Languedoc dialect than it is to any other dialect. The analysis of the glide-initial diphthongs therefore supports the idea that the source of the language used in the lyric poetry may be found in the

Languedoc dialect. Though the comparison of the patterns of variation found in the lyric poetry with the patterns found in each dialect does not strongly support or refute this idea, the comparison of the distribution of diphthong and monophthong variants of the

214

glide-initial diphthongs clearly affirms the connection between the lyric poetry and the

Languedoc dialect area.

Turning now to the question of the date of the lyric poetry, we have good evidence from the historical record that the troubadours lived and composed relatively early in the Old Occitan period. If we consider the pattern of adjective derivation by date from Chapter Three, it does not look as if the pattern in the lyric poetry fits with that of the early time periods of Old Occitan, as shown in Table 5.4.

Table 5.4 Adjective Lemmas and Tokens by Date of Text Compared with Lyric Poetry

Lemmas Tokens -art -enc -esc -ivol Total -art -enc -esc -ivol Total 11th c 0 0 5 0 5 0 0 7 0 7 0% 0% 100% 0% 0% 0% 100% 0% 1100-1150 0 0 1 0 1 0 0 7 0 7 0% 0% 100% 0% 0% 0% 100% 0% 1150-1200 14 3 0 0 17 20 3 0 0 23 82.6% 17.7% 0% 0% 87.0% 13.0% 0% 0% 1200-1250 34 11 14 0 59 104 19 28 0 151 57.6% 18.6% 23.7% 0% 68.9% 12.6% 18.5% 0% 1250-1300 61 18 10 2 91 289 65 27 2 383 67.0% 19.8% 11.0% 2.2% 75.5% 17.0% 7.1% 0.5% 1300-1350 63 44 31 61 199 276 137 71 271 755 31.7% 22.1% 15.6% 30.7% 36.6% 18.2% 9.4% 35.9% 1350-1400 23 14 15 15 67 178 47 45 17 287 34.3% 20.9% 22.4% 22.4% 62.0% 16.4% 15.7% 5.9% 1400-1450 22 29 3 63 117 54 65 3 365 487 18.8% 24.8% 2.6% 53.9% 11.1% 13.4% 0.6% 75.0% 1450-1500 26 36 3 44 109 114 86 10 181 391 23.9% 33.0% 2.8% 40.4% 29.2% 22.0% 2.6% 46.3% Lyric 18 13 17 0 48 84 25 49 0 158 Poetry 37.5% 27.1% 35.4% 0% 53.2% 15.8% 31.0% 0%

215

While the use of -esc does decrease during the Old Occitan period, the lyric poetry uses a larger proportion of adjectives derived with -esc than the early time periods in which there are enough tokens to draw some conclusions, that is, larger than the proportion in texts between 1150 and 1300. If the decrease in the use of -esc started before the Old Occitan period, the higher proportion of -esc adjectives found in the lyric poetry may reflect an earlier distribution of adjective-deriving suffixes, but that is simply speculation. In addition, the lyric poetry uses fewer adjectives derived with -art and more adjectives derived with -enc than the early time periods. Looking at the adjective derivation does not give any strong indication of the date of the composition of the lyric poetry.

The analysis of the texts by date in Chapter Four, however, makes it clear that, in some ways, the patterns found in the lyric poetry fit well with those found in the early time periods of Old Occitan. When we compare the distribution of diphthong and monophthong variants found in the lyric poetry with the pattern of distribution for each time period, the results both confirm the early date of composition for the lyric poetry and clearly set the lyric poetry texts apart from other texts written during the early periods of

Old Occitan, as shown in Table 5.5.

216

Table 5.5 Glide-Initial Diphthongs by Dialect of Text Compared with Lyric Poetry

Front Diphthong [je] Back Diphthong [wo]/[we] Diphthongs Monophthongs Total Diphthongs Monophthongs Total 11th c 0 119 119 1 73 74 0% 100.0% 1.4% 98.6% 1100- 49 543 592 8 37 45 1150 8.3% 91.7% 17.8% 82.2% 1150- 47 484 531 20 197 217 1200 8.9% 91.1% 9.2% 90.8% 1200- 1,008 3,515 4,523 382 2433 2,815 1250 22.3% 77.7% 13.6% 86.4% 1250- 2,258 10,583 12,841 836 4947 5,783 1300 17.6% 82.4% 14.5% 85.5% 1300- 5,817 10,033 15,850 3577 8133 11,710 1350 36.7% 63.3% 30.5% 69.5% 1350- 3,750 6,444 10,194 2218 5177 7,395 1400 36.8% 63.2% 30.0% 70.0% 1400- 3,036 5,405 8,441 1989 3640 5,629 1450 36.0% 64.0% 35.3% 64.7% 1450- 2,106 5,686 7,792 1338 4329 5,667 1500 27.0% 73.0% 23.6% 76.4% Lyric 6,629 12,271 18,900 2025 9255 11,280 Poetry 35.1% 64.9% 18.0% 82.0%

The distribution of the diphthong and monophthong variants of the back diphthongs in the lyric poetry is very similar to that of the thirteenth-century texts. It is immediately clear, however, that the lyric poetry has a far higher proportion of front diphthong variants than the texts before 1300. The front diphthongs are not found in the earliest time period, but appear rarely in the texts dated between 1100 and 1150 and increase in frequency. The lyric poetry, however, has a proportion of front diphthong variants nearly as high as the leveling off of the front diphthong variants after 1300: over one third.

While this may, at first glance, seem to argue against the early composition of the lyric

217

poetry, a closer look at the words in which the variation occurs actually confirms the connection between the earlier periods of Old Occitan and the lyric poetry.

5.4 The Contribution of the Prose Texts

The addition of the prose texts to the COM is an invaluable resource. Not only does it allow text type studies such as the one reported here, but the inclusion of the prose texts in an electronic form also allows quantitative research to be done much more effectively. Many of the texts that can be securely located and dated and thus can to be used in date and dialect analyses are prose texts, so such analyses would be very difficult without using the prose texts.

The inclusion of these texts in a digital, searchable format gives us a larger corpus to use for large-scale corpus studies. This is useful from a simply numerical point of view, giving a much larger number of texts and words to consider in our study of Old

Occitan. This larger number of texts and words makes computational techniques and statistical studies of Old Occitan more reliable, partially because it is less likely that our understanding of Old Occitan will be skewed by individual authors or texts, or by the practices of one individual type of text. The inclusion of the prose texts, with their widely varied topics, purposes, and formats, provides crucial evidence for considering which aspects of the language used in these texts can be attributed to the type of the texts and which aspects of the language used appear in only one type of text. For example, as discussed in Section 2.5, the use of the synthetic comparative adjectives showed that the use of synthetic comparative adjectives such as ausor and lonhor is best described as

218

restricted to lyric poetry rather than being part of the language as a whole, while the suppletive synthetic adjectives such as melhor and menor are used throughout the texts, regardless of the text type. Though less dramatic than Hock’s (2000) findings in Sanskrit syntax, in which apparent chronological differences were instead based on the type of text, the synthetic comparative adjectives do give an example of the same kind of case.

The features presented in this thesis represent only a beginning of comparisons of text types and what they can tell us about usage in the Old Occitan language as a whole and the usage associated with text types or purposes.

The prose texts are not only important for their number and the variety they bring to the Old Occitan corpus, but also because some, though certainly not all, of them may be closer to speech, by avoiding the careful rhyme, meter, and performativity of the lyric poetry. As Schneider (2002) noted, historical linguistics has often relied on poetic and formal styles, as these are the oldest extant texts in many languages, but the study of variation and change requires a focus on vernacular styles. Many of the prose texts in Old Occitan are not, strictly speaking, vernacular texts, but the inclusion of the prose texts gives us texts which are much closer to vernacular texts than the lyric poetry or even the non-lyric poetry, making research on variation and change possible in Old

Occitan.

The prose texts are not without their own concerns, however. The discussion of comparative adjectives in Section 2.5, for example, showed the effect of “mention” in the grammars and glossaries of Old Occitan which are included in the prose. In addition, some of these texts, such as the charters, include some Latin within the document, code-

219

switching between Latin and Old Occitan, or are translated from Latin, bringing up the influence of Latin. Consequently, while the prose texts are very important resources for our understanding of Old Occitan, and fascinating texts, they must be treated with caution in quantitative analyses, as has been done here.

5.5 Variation and Grammar

There is no single, simple set of rules or descriptions that accurately characterizes

Old Occitan, especially when we take into account the differences between the text types.

Instead, the Old Occitan corpus, as represented in the COM, involves a range of variants that occur with greater or lesser probability. Some of the patterns of variation can be explained using parameters such as the dialect and text type, as I have done here. This gives a much more accurate and complete picture of the language used in the texts than many of the descriptions of Old Occitan include. Because the texts come from such a large time period and geographical area, it’s not at all clear that such parameterization in any way reconstructs the patterns used by any given individual speaker.2 Instead, this creates a construct, based on the aggregate of the productions of many speakers with very different regional dialects, education levels, and so on, to represent the language as a whole that never really existed. Such constructs are useful for describing and analyzing language, but should not be confused with the grammar of individual speakers. While there is an important connection between community grammars and individual

2 While there are some texts for which we know little or nothing about the author, there are other texts in the Old Occitan corpus for which we know a great deal about the author. These texts could be grouped by author and compared in the same way that the content and goal of the text could be considered, as mentioned in Section 5.1, though it is likely that the number of texts and words by most authors would be too small for quantitative studies. 220

grammars, the two should not be conflated, and, though I have at times made reference to individual texts and the authors who wrote them, the focus has been on the distribution of variation within a community grammar rather than the grammar, or text, of an individual speaker.

In addition, as in the case of the comparative adjectives discussed in Chapter 2, some differences are not what they seem at first glance. The text type differences in the use of comparative adjective such as ausor, belazor, and gensor is not a difference in the grammar or the rules for forming comparative adjectives, but is instead primarily a lexical difference. The words ausor, belazor, gensor, and some of the other comparative adjectives are very infrequent words in Old Occitan and the lyric poets simply chose to use rare words which were not often included in other texts. This fundamentally lexical difference, however, appeared as a difference in the pattern of comparative adjective formation as an artifact of the lexical choices made by speakers.

Variation has often been excluded, whether explicitly or implicitly, from discussions of language structure, particularly in the case of syntactic theory. Even when variation in syntax or morphosyntax has been acknowledged and directly studied (e.g.

Kroch 1994), it is seen as a competition between two grammars rather than variation or optionality within a single grammar. (2002) notes, however, that variability exists in a range of syntactic and morphosyntactic features in Belfast English, though each speaker does not use all variants of each feature. If there is a competition between grammars, Henry argues, “then it is between a wide range of grammars, not just two, and a better characterization seems to be that the individual structures/parameter settings are

221

variable, rather than that there are separate grammars” (p. 274). Though none of the linguistic patterns discussed here are syntactic patterns, it does seem better to consider the kind of variation present in Old Occitan as variants within a single grammar rather than competition between a great many grammars.

The patterns found in this study and the ways in which they interact with one another seem to relate to the recent usage-based theories of language, in which grammar is conceptualized as the “cognitive organization of one’s experience with language”

(Bybee 2006, p. 711). An exemplar theory of language could help explain how speakers could be sensitive to patterns such as the ones discussed here, at least subconsciously. In an exemplar model of grammar, information about the usage and context of words is stored for all instances in which the word is heard. An exemplar of an adjective derived with -esc, for example, may be associated not only with the root of the adjective and with the derivational suffix -esc, but also with the circumstances in which it was used, among them the fact that it was used in lyric poetry. Because this information is stored and used for connecting and associating different parts of the grammar, an exemplar model of grammar may explain how speakers use a significantly different distribution of, for example, diphthong and monophthong variants in different types of texts quite consistently. Similarly, Henry (2002) notes that children learning syntactic and morphosyntactic constructions which show variation in the community not only acquire both variants at an early stage, but also use those variants in ways that very closely reflect the proportion in which the variants were used in the input to which the children were exposed. The acquisition device, and the syntax acquired, Henry argues, “must be

222

frequency-sensitive in the way that frequency studies would lead us to expect” (p. 280).

A sensitivity to frequency and to usage would allow the associations of particular linguistic features with text type and other parameters to be developed and reproduced.

5.6 Conclusion

This research represents a compromise between studying the grammar and variation within an individual text and treating the entire Old Occitan corpus as a single, monolithic entity. The grammar of some individual prose texts has been studied at length, as editions of individual texts are often published with extensive grammatical analyses, but the insights from those individual texts have rarely been brought to our understanding of the language as a whole. Rather than glossing over the variation and treating the language as a single invariant whole, or considering each individual text on a case-by- case basis, this thesis has considered all of the texts in the Old Occitan corpus with respect to the variation among and within them, to explore the ways in which this variation can be accounted for. This study has approached three aspects of the Old

Occitan language both quantitatively and qualitatively, though the quantitative aspect is emphasized in Chapters Three and Four.

Because the Old Occitan corpus, as represented by the COM, is not tagged in any way, the types of comparisons we can make easily are restricted to string searches of various kinds. The study and comparison of all three of the aspects of language explored in this thesis, the comparative formation of adjectives, the derivation of adjectives, and the diphthongization of the mid vowels, is based on string searches in the COM. If the

223

Old Occitan corpus were tagged morphologically or syntactically, more aspects of Old

Occitan, and more complicated aspects of Old Occitan, could be considered in the same way as the aspects considered here. Further, morphological tagging would allow us to easily distinguish words like Deu ‘god’ and deu, the third-person singular present tense form of dever ‘to owe; should’, allowing Deu ‘god’ to be used in analyses like those in

Chapter Four. The digital COM is an invaluable tool that has made this study possible.

Looking forward, however, tagging all or part of this corpus should be the next step.

In Section 1.9, I quoted Milroy’s suggestion for exploring some of the constraints on variation in historically attested language; in this research, I have done just that, and have considered some of the constraints on variation and the factors that influence variation within Old Occitan. I have shown some of the ways in which the type of text in

Old Occitan is associated with the use of certain variants. This study also highlights the interaction between patterns based on text type and those based on other parameters such as date and dialect, and illustrates some of the ways in which these differences and interactions can inform our understanding of the Old Occitan language and the way that the language, and these texts, are approached.

224

References

Adams, E. 1913. Word Formation in Provencal. New : The MacMillan Company.

Agresti, A. 1990. Categorical data analysis. New York: John Wiley and Sons.

Akehurst, F. & Davis, J. M. (Eds.). 1995. Handbook of the Troubadours. University of California Press.

Alibert, L. 1937. Gramatica occitana, segon los parlers lengadocians. Tolosa: Societat d’estudis .

Alibert, L. 1965. Dictionnaire Occitan-Français d’apres les parlers languedociens. Institut d’etudes Occitanes.

Allen, C. 1980. “Movement and deletion in Old English.” Linguistic Inquiry, 2(2), 261- 323.

Anderson, J. & Rochet, B. 1979. Historical Romance morphology. Published for the University of Calgary by University Microfilms International.

Anglade, J. 1977. Grammaire de l’Ancien Provençal. Éditions Klincksieck.

Bartch, K. 1856. Denkmäler der provenzalischen Litteratur. Litterarischer Verein.

Bec, P. 1973. Manuel pratique d’occitan moderne. Paris: Picard.

Bec, P. 1986. La langue occitane. 5th ed. Presses Universitaires de France.

Biber, D., Conrad, S., & Reppen, R. 1998. Corpus Linguistics: Investigating Structure and Use. Cambridge: Cambridge University Press.

225

Bishop, C., Kudrycz, W., Martin, J., Neil, B., & Speed, D. 2007. Text and transmission in Medieval Europe. In C. Bishop (Ed.), Text and transmission in Medieval Europe (pp. 1-17). Cambridge Scholars Publishers.

Borghi Cedrini, L. 1980. La lingua dei manoscritti valdesi e gli attuali dialetti delle Valli. BSSV, 148, 37-47.

Brunel, C. 1926. Les plus anciennes chartes en langue provençale. Paris: Éditions A. et J. Picard.

Brunel, C. 1952. Les plus anciennes chartes en langue provençale. Supplement. Paris: Éditions A. et J. Picard.

Cantalausa, J. 1990. Aux racines de notre langue: les langues populaires des Gaules de 480 á 1080. : Culture d’Oc.

Cantalausa, J. 2004. L’occitan vehiculaire du VIIIe siècle, l’occitan littéraire du Xe siècle: glossaire historique et ètymologique. Le Monastère: Culture d’Oc.

Chambers, F. 1971. Proper names in the Lyrics of the Troubadours. Chapel Hill: University of North Carolina Press. de Conca, M. 2008. Il lessico dei trovatori del periodo classico. Tome I : Arnaut Daniel (progetto pilotto). Diss. Univerity of Geneva.

Crescini, V. 1926. Manuale per l’avviamento agli studi provenzali. 3rd ed. Ulrico Hoepli, Milan.

Croft, W. 2006. The relevance of an evolutionary model of historical linguistics. In Thomsen, O (Ed.), Competing models of linguistic change. John Benjamins Publishing Company.

Dauzat, A. 1951. Dictionaire étymologique des noms de familee et prenoms de France. Paris: Auge, Gillon, Hollier-Larosse, Moreau et Cie.

Dronke, F. 1968. Medieval lyric. Perrenial.

Fisher, R. A. 1954. Statistical methods for researchers. Oliver and Boyd.

Fleischman, S. 1995. The non-lyric texts. In F. Akehurst & J. Davis (Eds.), A handbook of the troubadours (pp. 167-184). University of California Press.

226

Fleischman, S. 2000. Methodologies and ideologies in historical linguistics: on working with older languages. In S. Herring, P. Van Reenen, & L. Schøsler (Eds.), Textual parameters in older languages (pp. 33-58). John Benjamins Publishing Company.

Frank, I. 1957. Répertoire métrique de la poésie des troubadours. Paris: Champignon.

Grafström, A. 1958. Etude sur la graphie des plus anciennes chartes languedociennes avec un essai d’interprétation phonétique. Uppsala: Almqvist & Wiksell.

Grafström, A. 1968. Etude sur la morphologie des plus anciens chartes languedociennes. Almqvist & Wiksells.

Grandgent, C. 1905. An outline of the phonology and morphology of Old Provençal. Boston: D. C. Heath & Co. Publishers.

Hall, R. 1974. External History of the Romance Languages. New York: American Elsevier Publishing.

Hall, R. 1976. Proto-Romance phonology. New York: American Elsevier Publishing.

Harris, M. & Vincent, N. 1988. The Romance languages. Oxford: Oxford University Press.

Harris, M. 1984. “Old Waldensian: Some linguistic and editorial observations.” Philology, 38(2), 200-225.

Henry, A. 2002. Variation and syntactic theory. In J. Chambers, P. Trudgill, and N. Schilling-Estes (Eds.), The handbook of language variation and change (pp. 267- 280). Blackwell Publishing.

Herring, S. 2000. Poetricality and word order in Old Tamil. In S. Herring, P. Van Reenen, & L. Schøsler (Eds.), Textual parameters in older languages (pp. 197- 236). John Benjamins Publishing Company.

Herring, S., Van Reenen, P. & Schøsler, L. 2000. On textual parameters and older languages. In S. Herring, P. Van Reenen, & L. Schøsler (Eds.), Textual parameters in older languages (pp. 1-32). John Benjamins Publishing Company.

Hill, R. & Bergin, T. (Eds.). 1941. Anthology of the Provencal Troubadours. New Haven: Yale University Press.

Hock, H.H. 1985. Pronoun fronting and the notion of verb-second position in Beowulf. In J. R. Faarlund (Ed.), Germanic Linguistics (pp. 70-86). Bloomington, IN: IULC.

227

Hock, H.H. 1991. Principles of historical linguistics. 2nd ed. Berlin: Mouton de Gruyter.

Hock, H.H. 2000. Genre, discourse, and syntax in early Indo-European, with emphasis on Sanskrit. In S. Herring, P. Van Reenen, & L. Schøsler (Eds.), Textual parameters in older languages (pp. 163-195). John Benjamins Publishing Company.

Hutcheson, B. 1995. Old English poetic meter. Cambridge: D. S. Brewer.

James, E. 1982. The origins of France. The Macmillan Press Ltd.: London.

Janda, R. & Joseph, B. 2003. Reconsidering the canons of sound-change: Towards a ‘Big Bang’ theory. In B. Blake & K. Burridge (Eds.), Historical Linguistics 2001 (pp. 205-219).

Jensen, F. 1976. The Old Provencal noun and adjective declension. Odense University Press.

Jensen, F. 1986. The syntax of Medieval Occitan. Mac Neimeyer Verlag Tubingen.

Joseph, B. 2000. Textual Authenticity. In S. Herring, P. Van Reenen, & L. Schøsler (Eds.), Textual parameters in older languages (pp. 309-330). John Benjamins Publishing Company.

Joseph, B. 2006. On Projecting Variation Back into a Proto-Language, with Particular Attention to Germanic Evidence. In T. Cravens (Ed.), Variation and Reconstruction (pp. 103-118). Amsterdam: John Benjamins.

Jucker, A. 1992. Social stylistics: Syntactic variation in British newspapers. Berlin and New York: Mouton de Gruyer.

Kalos, M. & Whitlock, P. 2009. Monte Carlo Methods. John Wiley and Sons.

Kroch, A. 1994. Morphosyntactic variation. In K. Beals, J. Denton, R. Knippen, L. Melnar, H. Suzuki, & E. Zeinfield (Eds.), Papers from the 10th regional meeting of the Chicago Linguistic Society. Volume 2: The Parasession on Variation in Linguistic Theory. Chicago Linguistic Society.

Labov, W. 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.

Lass, R. 1997. Historical linguistics and language change. Cambridge studies in linguistics.

Lausberg, H. 1962. Romanische sprachwissenschaft Vol. III. Berlin: Walter de Gruyter & Co. 228

Levy, E. 1904. Provenzalisches Supplement Wörterbuch. Leipzig: O R Reisland.

Limentani, A. 1977. L’eccezione narrative: La provenza medieval e l’arte del racconto. Turin: Einaudi.

Marshall, J. 1972. The Razos de Trobas of Raimon Vidal and associated texts. London: Oxford University Press.

Matzke, J. 1898. “The question of free and checked vowels in Gallic Popular Latin.” PMLA, 13(1), 1-41.

Mendeloff, H. 1969. A manual of comparative romance linguistics: Phonology and morphology. Washington DC: Catholic University of America Press.

Meyer-Lübke, W. 1923. Grammaire des langues romanes. Reprint. New York: G. E. Stechert & Co.

The Provencal Database of the Project for American and French Research on the Treasury of the French Language (ARTFL).

Paden, W. 1995. Manuscripts. In F. Akehurst & J. Davis (Eds.), A handbook of the troubadours (pp. 307-333). University of California Press.

Paden, W. 1998. An introduction to Old Occitan. New York: The Modern Language Association of America.

Paden, W. 2000. Medieval lyric: genres in historical context. University of Illinois Press.

Pansier, P. 1974. Histoire de la langage provençale.: Slatkine Reprints.

Penny, R. 2002. A history of the . 2nd ed. Cambridge University Press.

Preminger, A., & Brogan, T. (Eds.) 1993. The new Princeton encyclopedia of poetry and poetics. Princeton University Press.

Quine, W. 1981. Mathematical logic. 9th ed. Cambridge: Harvard University Press.

Raynourd, F. 1976. Grammaire comparée des langues de l’europe latine dans leurs rapports avec la langue des troubadours. Geneva: Slatkine Reprints.

Recasens, D. 2004. “A production account of sound changes affecting diphthongs and in Romance.” Diachronica, 21(1), 161-197.

Ricketts, P. & Reed, A. 2005. Concordance de l’Occitan Médiéval. Brepols. 229

Rissanen, M. 1986: Variation and the study of English historical syntax. In D. Sankoff (Ed.), Diversity and diachrony (pp. 97-109). Amsterdam: J. Benjamins.

Sampson, S. 2010. Noun phrase word order variation in old English verse and prose. (Doctoral dissertation). Available from OhioLink ETD Center.

Scherman, K. 1987. The birth of France. New York: Random House.

Schneider, E. 2002. Investigating variation and change in written documents. In J. K. Chambers, P. Trudgill, & N. Schilling-Estes (Eds.), The Handbook of Language Variation and Change (pp. 67-96). Wiley-Blackwell.

Smith, N. 1976. Figures of repetition in the Old Provencal lyric: A study in the style of the troubadours. North Carolina Studies in the Romance Languages and Literatures.

Seigel, J. 1985. “Koines and koineization.” Language in society, 14, 357-78.

Skårup, P. 1997. Morphologie elementaire de l’ancien Occitan. Etudes Romanes 37. Museum Tusculanum Press.

Smith, N. & Bergin, T. 1984. An Old Provencal Primer. New York: Garland Publishing.

Suwe, I. 1943. La Vida de sant Honorat, poème provençal de Raimond Feraud, tome I. Uppsala: A.-B. Lundequistka Bokhandeln.

Swales, J. 1990. Genre analysis: English in academic and research settings. Cambridge, NY: Cambridge University Press.

Tekavčić, P. 1972. Morfosintassi. Vol. II of Grammatica storica dell’italiano. Bologna: Il Mulino.

Thiolier-Méjean, S. 1997. “Bertran Boysset d’, l’arpenteur de Dieu.” La France latine 125, 183-228.

Tuttle, E. H. 1917. “Notes on Romanic ‘e’ and ‘i’”. Modern Philology, 15(3), 181-192.

Tuttle, E. 1919. “Vowel-breaking in southern France.” Modern Philology, 16(11), 585- 593.

Vidal, A. 1901. “Etablissement du marché à Montagnac.” Revue des langues romanes 44, 70-1.

Wallace-Hadrill, J. M. 1985. The barbarian west 400-1000. Basil Blackwell 230

Wartburg, W von. 1928-. Französisches etymologishes Wörterbuch. 25 vols. to date. Bonn: Klop.

Wayland, D. 1982. A singer’s guide to the pronunciation of Old Provencal, Old French and Middle High German as it applies to the compositions of troubadours, trouveres and minnesingers. Bowling Green University Thesis.

Wilson, C. 2009. “A Closer Look at Early in Troubadour Poetry.” Paper presented at the International Conference of Historical Linguistics (August 10-15 2009).

Wilson, C. 2010. “‘Sporadic Diphthongs’ in Old Occitan: Dialect Borrowing, Spelling Variation, Koineization, Analogy, Lexical Diffusion, or What?” Paper presented at the International Medieval Congress (May 13-16, 2010).

Zufferey, F. 1987. Recherches linguistics sur les chansoniers provençaux. Geneva : Publications Romanes et Françaises.

Zumthor, P. 1995. An overview: Why the troubadours? In F. Akehurst & J. Davis (Eds.), A handbook of the troubadours (pp. 11-18). University of California Press.

231

Appendix A: Index of Texts used in Examples

Lyric Poetry Texts

Author Title/First Line Source of the Edited Text in the COM PC 029 017 Arnaut Si·m fos Toja, G. 1960. Arnaut Daniel: Canzoni, Daniel Amors de joi edizione critica, studio introduttivo, donar tant larja commento e traduzione a cura di Gianluigi Toja. Florence : Sanzoni, pp. 359-65. PC 155 020 Folquet de Si com sel Squillacioti, P. 1999. Le poesie di qu’es tan Folchetto di Marsiglia. Biblioteca degli greujatz Studi mediolatini e volgari, Nuova serie, XIV). Pisa: Pacini, pp. 404-9. PC 401 007 Raimon Quascus planh Radaelli, A. 1997. Raimon Gaucelm de Gaucelm lo sieu Béziers: poesie, edizione critica a cura di (de dampnatge Anna Radaelli. Firenze: La Nuova Italia Béziers) Editrice, pp. 164-8. PC 410 006 Raimon de Per l’avinene Parducci, A. 1911. "Raimon de Tors, Tors pascor trovatore marsigliese del sec. XIII." Studj romanzi 7, 5-59.

232

Non-Lyric Poetry Texts

Title Date Location/ Source of the Edited Text in the Dialect COM CCA Chanson de la early 13th c Toul. - Gougaud, H. (Ed.). 1989. Croisade contre (ms may be Languedoc Chanson de la Croisade les albigeois late 13th c) Albigeoise, texte original [d'Eugène Martin-Chabot]. Lettres Gothiques. Paris: Livre de Poche. CSF Chanson de mid-13th c U. Cianciòlo. 1941. "Il Sainte Foy compendio provenzale verseggiato della Chirurgia di Ruggero da alerno", Archivum romanicum 25, 1-83.

233

Prose Texts

Title Date Location/ Topic Source of the Edited Text Dialect in the COM FDBE Fors de béarnaise - administrative Courteault, H. 1902. Une Béarn Gascon chronique béarnaise inédite. In Mélanges Léonce Couture: Etudes d’histoire méridionale dédiées à la mémoire de Léonce Couture (1832- 1902). : Privat, 127-35. JDFR Regles de end of grammar Marshall, J. 1972. The Trobar the Razos de Trobar of (ms. R) 14th c Raimon Vidal and Associated Texts. Oxford: O.U.P., pp. 59-91. NTVZ Nouveau mid- Waldensian religious Salvini, C. 1890. “Il Testament 15th c Nuovo Testamento vaudois valdese, secondo la de Zurich lezione del Codice di Zurigo.” Archivio glottologico italiano, 11, 1-308.

234