The Relative Divergence of Dutch Dialect Pronunciations from Their Common Source: an Exploratory Study

The relative divergence of Dutch dialect pronunciations from their common source: an exploratory study

Wilbert Heeringa Brian Joseph Department of Humanities Computing Department of Linguistics University of Groningen The Ohio State University Groningen, The Netherlands Columbus, Ohio, USA [email protected] [email protected]

1 Introduction Abstract In Dutch dialectology the Reeks Nederlandse Dia- In this paper we use the Reeks Nederlandse lectatlassen (RND), compiled by Blancquaert & Dialectatlassen as a source for the recon- Pée (1925-1982) is an invaluable data source. The struction of a ‘proto-language’ of Dutch atlases cover the Dutch language area. The Dutch dialects. We used 360 dialects from loca- area comprises The Netherlands, the northern part tions in the Netherlands, the northern part of Belgium (Flanders), a smaller northwestern part of Belgium and French-Flanders. The den- of France, and the German county of Bentheim. sity of dialect locations is about the same The RND contains 1956 varieties, which can be everywhere. For each dialect we recon- found in 16 volumes. For each dialect 139 sen- structed 85 words. For the reconstruction of tences are translated and transcribed in phonetic vowels we used knowledge of Dutch his- script. Blancquaert mentions that the questionnaire tory, and for the reconstruction of conso- used for this atlas was conceived of as a range of nants we used well-known tendencies sentences with words that illustrate particular found in most textbooks about historical sounds. The design was such that, e.g., various linguistics. We validated results by com- changes of older Germanic vowels, diphthongs and paring the reconstructed forms with pro- consonants are represented in the questionnaire nunciations according to a proto-Germanic (Blancquaert 1948, p. 13). We exploit here the his- dictionary (Köbler, 2003). For 46% of the torical information in this atlas. words we reconstructed the same vowel or The goals of this paper are twofold. First we aim the closest possible vowel when the vowel to reconstruct a ‘proto-language’ on the basis of to be reconstructed was not found in the the RND dialect material and see how close we dialect material. For 52% of the words all come to the protoforms found in Gerhard Köbler’s consonants we reconstructed were the neuhochdeutsch-germanisches Wörterbuch same. For 42% of the words, only one con- (Köbler, 2003). We recognize that we actually re- sonant was differently reconstructed. We construct a stage that would never have existed in measured the divergence of Dutch dialects prehistory. In practice, however, we are usually from their ‘proto-language’. We measured forced to use incomplete data, since data collec- pronunciation distances to the proto- tions -- such as the RND – are restricted by politi- language we reconstructed ourselves and cal boundaries, and often some varieties are lost. correlated them with pronunciation dis- In this paper we show the usefulness of a data tances we measured to proto-Germanic source like the RND. based on the dictionary. Pronunciation dis- Second we want to measure the divergence of tances were measured using Levenshtein Dutch dialects compared to their proto-language. distance, a string edit distance measure. We We measure the divergence of the dialect pronun- found a relatively strong correlation ciations. We do not measure the number of (r=0.87). changes that happened in the course of time. For

31 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, pages 31–39, Prague, June 2007. c 2007 Association for Computational Linguistics example if a [u] changed into a [y] and then the [y] dialects, the [d] in 43% of the dialects and the [h] in changed into a [u], we simply compare the [u] to 20% of the dialects. 1 According to the neu- the proto-language pronunciation. However, we do hochdeutsch-germanisches Wörterbuch the [@] or compare Dutch dialects to both the proto-language we reconstruct ourselves, which we call Proto- [@h] is the original sound. Our data show that sim- Language Reconstructed (PLR), and to the Proto- ply reconstructing the most frequent sound, which language according to the proto-Germanic Dic- is the [d], would not give the original sound, but tionary, which we call Proto-Germanic according using the chain the original sound is easily found. to the Dictionary (PGD). To get evidence that the /@h/ has raised to /d/ (and probably later to /h/) in a particular word, we 2 Reconstructing the proto-language need evidence that the /d/ was part of the chain. From the nearly 2000 varieties in the RND we Below we discuss another chain where the /h/ has selected 360 representative dialects from locations lowered to /`h/, and where the /d/ is missing in the in the Dutch language area. The density of locations is about the same everywhere. chain. To be sure that the /d/ was part of the chain, In the RND, the same 141 sentences are trans- we consider the frequency of the /d/, i.e. the num- lated and transcribed in phonetic script for each ber of dialects with /d/ in that particular word. The dialect. Since digitizing the phonetic texts is time- frequency of /d/ should be higher than the fre- consuming on the one hand, and since our proce- dure for measuring pronunciation distances is a quency of /D/ and/or higher than the frequency of word-based method on the other hand, we initially /h/. Similarly for the change from /@t/ to /n/ we selected from the text only 125 words. Each set consider the frequency of /n/. represents a set of potential cognates, inasmuch as Another development mentioned by Van Bree is they were taken from translations of the same sen- that high monophthongs diphthongize. In the tran- tence in each case. In Köbler’s dictionary we found sition from middle Dutch to modern Dutch, the translations of 85 words only; therefore our analy- monophthong /hÙ/ changed into /Dh/, and the mo- ses are based on those 85 words. nophthong /xÙ/ changed into either /ªx/ or /åh/ We use the comparative method (CM) as the (Van der Wal, 1994). According to Van Bree main tool for reconstructing a proto-form on the (1996, p. 99), diphthongs have the tendency to basis of the RND material. In the following sub- lower. This can be observed in Polder Dutch where sections we discuss the reconstruction of vowels and consonants respectively. /Dh/ and /ªx/ are lowered to /`h/ and /`t/ (Stroop 1998). We recognize the following chains: 2.1 Vowels For the reconstruction of vowels we used knowl- h → Dh → @h edge about sound developments in the history of x → ªx/åh → @t Dutch. In Old Dutch the diphthongs /`h/ and /`t/ t → åt → @t turned into monophthongs /dÙ/ and /nÙ/ respectively (Quak & van der Horst 2002, p. 32). Van Bree Different from the chains mentioned above, we (1996) mentions the tendencies that lead /dÙ/ and do not find the /d/ and /n/ respectively in these /nÙ/ to change into /hÙ/ and /tÙ/ respectively. From chains. To get evidence for these chains, the fre- these data we find the following chains: quency of /d/ should be lower than both the frequency of /D/ and /h/, and the frequency of /n/ @h → Dh → d → h should be lower than both /å/ and /t/. @t → åt → n → t Sweet (1888, p. 20) observes that vowels have the tendency to move from back to front. Back An example is twee ‘two’ which has the vowel

[@] in 11% of the dialects, the [D] in 14% of the 1 The sounds mentioned may be either monophthongs or diphthongs.

32 vowels favour rounding, and front vowels un- may be found in other contexts as well.” In our rounding. From this, we derive five chains: data set zes ‘six’ is pronounced with a initial [y] in most cases and with an initial [r] in the dialects of h ← x ← t 2 Stiens and Dokkum. We reconstructed [r]. H ← X ← T Final voiced obstruents of an utterance become d ← N ← n voiceless. Sweet (1888, p. 18) writes that the natu- D ← ª ← å ral isolative tendency is to change voice into un- ` ← ← ← @ voiced. He also writes that the “tendency to un- voicing is shown most strongly in the stops.” Hock

& Joseph (1996, p. 129) write that final devoicing However, unrounded front vowels might be- “is not confined to utterance-final position but ap- come rounded under influence from a labial or plies word-finally as well.” 3 In our data set we labiodental consonant. For example vijf ‘five’ is found that for example the word-final consonant sometimes pronounced as [uhe] and sometimes as in op ‘on’ is sometimes a [p] and sometimes a [b]. [uxe]. The [h] has been changed into [x] under in- Based on this tendency, we reconstruct the [b]. fluence of the labiodental [u] and [e]. Plosives become fricatives between vowels, be- Sweet (1888, p. 22) writes that the dropping of fore vowels or sonorants (when initial), or after unstressed vowels is generally preceded by various vowels (when final). Sweet writes that the “opening weakenings in the direction of a vowel close to of stops generally seems to begin between vow- schwa. In our data we found that the word mijn els…” (p. 23). Somewhat further he writes that in ‘my’ is sometimes [h] and sometimes [©]. A non- Dutch the g has everywhere become a fricative central unstressed vowel might change into a cen- while in German the initial g remained a stop. For © tral vowel which in turn might be dropped. In gen- example goed ‘good’ is pronounced as [ft= s] in eral we assume that deletion of vowels is more Frisian dialects, while other dialects have initial [¿] likely than insertion of vowels. or [w]. Following the tendency, we consider the [f] Most words in our data have one syllable. For to be the older sound. Related to this is the pronun- each word we made an inventory of the vowels ciation of words like schip ‘ship’ and school used across the 360 varieties. We might recognize ‘school’. As initial consonants we found [sk], [sx] a chain in the data on the basis of vowels which and [R]. In cases like this we consider the [sk] as appear at least two times in the data. For 37 words the original form, although the [k] is not found be- we could apply the tendencies mentioned above. In tween vowels, but only before a vowel. the other cases, we reconstruct the vowel by using Oral vowels become nasalized before nasals. the vowel found most frequently among the 360 Sweet (1888) writes that “nothing is more common varieties, working with Occam’s Razor as a guid- than the nasalizing influence of a nasal on a pre- ing principle. When both monophthongs and diph- ceding vowels” and that there “is a tendency to thongs are found among the data, we choose the drop the following nasal consonant as superfluous” most frequent monophthong. Sweet (1888, p. 21) when “the nasality of a vowel is clearly developed” writes that isolative diphthongizaton “mainly af- and “the nasal consonant is final, or stands before fects long vowels, evidently because of the diffi- another consonant.” (p. 38) For example gaan ‘to culty of prolonging the same position without change.” go’ is pronounced as [f@1Ùm] in the dialect of Dok-

2.2 Consonants 2 In essence, in this and other such cases, a version of For the reconstruction of consonants we used ten the manuscript-editing principle of choosing the “lectio difficilior” was our guiding principle. tendencies which we discuss one by one below. 3 Initial and medial voiceless obstruents become We do feel, however, that word-final devoicing, even though common cross-linguistically, is, as Hock voiced when (preceded and) followed by a voiced 1976 emphasizes, not phonetically determined but sound. Hock & Joseph (1996) write that weakening rather reflects the generalization of utterance-final (or lenition) “occurs most commonly in a medial developments into word-final position, owing to the voiced environment (just like Verner’s law), but overlap between utterance-finality and word-finality.

33 kum, and as [fd(=©]( in the dialect of Stiens. The na- fore the /d/ and /t/ respectively. In Old Dutch ol salized [d(=©] in the pronunciation of Stiens already changed into ou (Van Loey 1967, p. 43, Van Bree 1987, p. 135/136). Therefore we reconstruct the /l/ indicates the deletion of a following nasal. Consonants become palatalized before front with preceding /n/ or /@/. vowels. According to Campbell (2004) “palatalization often takes place before or after i and j or be- 3 The proto-language according to the fore other front vowels, depending on the lan- dictionary guage, although unconditioned palatalization can The dictionary of Köbler (2003) provides Ger- also take place.” An example might be vuur which manic proto-forms. In our Dutch dialect data set © is pronounced like [eit= q] in Frisian varieties, we have transcriptions of 125 words per dialect. while most other varieties have initial [e] or [u] We found 85 words in the dictionary. Other words followed by [h] or [x]. were missing, especially plural nouns, and verb Superfluous sounds are dropped. Sweet (1888) forms other than infinitives are not included in this introduced this principle as one of the principles of dictionary. economy (p. 49). He especially mentioned that in For most words, many proto-Germanic forms are given. We used the forms in italics only since [Mf] the superfluous [f] is often dropped (p. 42). In these are the main forms according to the author. If our data we found that krom ‘curved’ is pro- different lexical forms are given for the same nounced [jqT=l] in most cases, but as [jqT(=lo] in word, we selected only variants of those lexical the dialect of Houthalen. In the reconstructed form forms which appear in standard Dutch or in one of we posit the final [o]. the Dutch dialects. Medial [h] deletes between vowels, and initial The proto-forms are given in a semi-phonetic [h] before vowels. The word hart ‘heart’ is some- script. We converted them to phonetic script in times pronounced with and sometimes without ini- order to make them as comparable as possible to tial [g]. According to this principle we reconstruct the existing Dutch dialect transcriptions. This ne- the [g]. cessitated some interpretation. We made the fol- z lowing interpretation for monophthongs: [r] changes to [ ]. According to Hock and Jo- seph (1996) the substitution of uvular [z] for trilled spel- pho- spel- pho- spel- pho- (post-)dental [q] is an example of an occasional ling netic ling netic ling netic change apparently resulting from misperception. In h H* P P* t t* the word rijp ‘ripe’ we find initial [q] in most cases H◊ hÙ* P" PÙ* t" tÙ* and [z] in the dialects of Echt and Baelen. We re- d D* ` @* n å* constructed [q]. d" dÙ* `" `Ù* n" nÙ* U Syllable initial [w] changes in [ ]. Under ‘Lip to Lip-teeth’ Sweet (1888) writes that in “the Diphthongs are interpreted as follows: change of p into f, w into v, we may always assume an intermediate [¥], [A], the latter being the Mid- spel- pho- spel- pho- dle German w“ (p. 26), and that the “loss of back ling netic ling netic modification is shown in the frequent change of ai @=h* ei D=h* v t* t* (w) into (v) through [A], as in Gm.” Since – au @= eu D= meant as “voiced lip-to-teeth fricative” – is close to

[U] – lip-to-teeth sonorant – we reconstruct [v] if We interpreted the consonants according to the both [v] and [U] are found in the dialect pronuncia- following scheme: tions. This happens for example in the word wijn ‘wine’. The cluster ol+d/t diphthongizes to ou + d/t. For example English old and German alt have a /l/ be-

34 spel- pho- spel- pho- spel- pho- 4 Measuring divergence of Dutch dialect ling netic ling netic ling netic pronunciations with respect to their p o* f e* m l* proto-language b n a, u* * s* m, M* Once a protolanguage is reconstructed, we are able t s* s r* ng M* to measure the divergence of the pronunciations of d c* z y* w v descendant varieties with respect to that protolan- k j* h w, g* r q* guage. For this purpose we use Levenshtein distance, which is explained in Section 4.1. In Sec- g l f, ¿* k* tions 4.2 the Dutch dialects are compared to PLR j i* and PGD respectively. In Section 4.3 we compare PLR with PGD. Lehmann (2005-2007) writes that in the early stage of Proto-Germanic “each of the obstruents 4.1 Levenshtein distance had the same pronunciation in its various In 1995 Kessler introduced the Levenshtein dis- locations…”. “Later, /b d g/ had fricative tance as a tool for measuring linguistic distances allophones when medial between vowels. between language varieties. The Levenshtein dis- Lehmann (1994) writes that in Gothic “/b, d, g/ tance is a string edit distance measure, and Kessler has stop articulation initially, finally and when applied this algorithm to the comparison of Irish doubled, fricative articulation between vowels.” dialects. Later the same technique was successfully We adopted this scheme, but were restricted by the applied to Dutch (Nerbonne et al. 1996; Heeringa RND consonant set. The fricative articulation of 2004: 213–278). Below, we give a brief explana- /a/ would be [A] or [u]. We selected the [u] since tion of the methodology. For a more extensive ex- this sound is included in the RND set. The fricative planation see Heeringa (2004: 121–135). articulation of /c/ would be [C], but this consonant 4.1.1 Algorithm is not in the RND set. We therefore used the [c] which we judge perceptually to be closer to the [C] Using the Levenshtein distance, two varieties are than to the [y]. The fricative articulation of /f/ is compared by measuring the pronunciation of /¿/ which was available in the RND set. words in the first variety against the pronunciation of the same words in the second. We determine We interpreted the h as [g] in initial position, how one pronunciation might be transformed into and as [w] in medial and final positions. An n be- the other by inserting, deleting or substituting fore k, g or h is interpreted as [M], and as [m] in all sounds. Weights are assigned to these three opera- other cases. The should actually be interpreted as tions. In the simplest form of the algorithm, all op- [S], but this sound in not found in the RND set. erations have the same cost, e.g., 1. Just as we use [c] for [C], analogously we use [s] Assume the Dutch word hart ‘heart’ is pronounced as [g@qs] in the dialect of Vianen (The for [S]. We interpret double consonants are gemi- nates, and transcribe them as single long conso- Netherlands) and as [Pqs©] in the dialect of Naz- nants. For example nn becomes [mÙ]. areth (Belgium). Changing one pronunciation into the other can be done as follows: Several words end in a ‘-’ in Köbler’s dictionary, meaning that the final sounds are unknown or * irrelevant to root and stem reconstructions. In our g@qs delete g 1 transcriptions, we simply note nothing. @qs subst. @/P 1 Pqs insert © 1 Pqs©*  3

35 In fact many string operations map [g@qs] to each sound a spectrogram was made with PRAAT [Pqs©]. The power of the Levenshtein algorithm is using the so-called Barkfilter, a perceptually ori- ented model. On the basis of the Barkfilter repre- that it always finds the least costly mapping. sentation, segment distances were calculated. In- To deal with syllabification in words, the serted or deleted segments are compared to silence, Levenshtein algorithm is adapted so that only a and silence is represented as a spectrogram in vowel may match with a vowel, a consonant with a which all intensities of all frequencies are equal to consonant, the [j] or [w] with a vowel (or opposite), the [i] or [u] with a consonant (or opposite), 0. We found that the [>] is closest to silence and and a central vowel (in our research only the the [`] is most distant. This approach is described schwa) with a sonorant (or opposite). In this way extensively in Heeringa (2004, pp. 79-119). unlikely matches (e.g. a [p] with an [a]) are pre- In perception, small differences in pronunciation vented.4 The longest alignment has the greatest may play a relatively strong role in comparison to number of matches. In our example we thus have larger differences. Therefore we used logarithmic the following alignment: segment distances. The effect of using logarithmic distances is that small distances are weighted rela- g* @* q* s* tively more heavily than large distances. * * P* q* s* ©* 4.1.3 Processing RND data 

1 1 1 The RND transcribers use slightly different nota-

tions. In order to minimize the effect of these dif- 4.1.2 Operations weights ferences, we normalized the data for them. The

consistency problems and the way we solved them The simplest versions of this method are based on are extensively discussed in Heeringa (2001) and a notion of phonetic distance in which phonetic Heeringa (2004). Here we mention one problem overlap is binary: non-identical phones contribute which is highly relevant in the context of this pa- to phonetic distance, identical ones do not. Thus per. In the RND the ee before r is transcribed as the pair [h,Ä] counts as different to the same degree [dÙ] by some transcribers and as [H=] by other tran- as [h,H]. The version of the Levenshtein algorithm scribers, although they mean the same pronuncia- which we use in this paper is based on the com- tion as appears from the introductions of the differ- parison of spectrograms of the sounds. Since a ent atlas volumes. A similar problem is found for spectrogram is the visual representation of the oo before r which is transcribed either as [nÙ] or acoustical signal, the visual differences between the spectrograms are reflections of the acoustical [T=], and the eu before r which is transcribed as [NÙ] differences. The spectrograms were made on the or [X=]. Since similar problems may occur in other basis of recordings of the sounds of the Interna- contexts as well, the best solution to overcome all tional Phonetic Alphabet as pronounced by John of these problems appeared to replace all [H]’s by Wells and Jill House on the cassette The Sounds of 5 [d]’s, all [T]’s by [n]’s, and all [X]’s by [N]’s, even the International Phonetic Alphabet from 1995. though meaningful distinctions get lost. The different sounds were isolated from the re- Especially suprasegmentals and diacritics might cordings and monotonized at the mean pitch of be used diffferently by the transcribers. We process each of the two speakers with the program 6 the diacritics voiceless, voiced and nasal only. For PRAAT (Boersma & Weenink, 2005). Next, for details see Heeringa (2004, p. 110-111). The distance between a monophthong and a 4 Rather than matching a vowel with a consonant, the diphthong is calculated as the mean of the distance algorithm will consider one of them as an insertion between the monophthong and the first element of and another as a deletion. 5 See http://www.phon.ucl.ac.uk/home/wells/cassette.htm. the Institute of Pronunciation Sciences of the 6 The program PRAAT is a free public-domain program University of Amsterdam and is available at developed by Paul Boersma and David Weenink at http://www.fon.hum.uva.nl/praat.

36 the diphthong and the distance between the mo- PGD. The Limburg dialects have shared in the nophthong and the second element of the diph- High German Consonant Shift. Both the Belgium thong. The distance between two diphthongs is and Dutch Limburg dialects are found east of the calculated as the mean of the distance between the Uerdinger Line between Dutch ik/ook/-lijk and first elements and the distance between the second High German ich/auch/-lich. The Dutch Limburg elements. Details are given in Heeringa (2004, p. dialects are found east of the Panninger Line be- 108). tween Dutch sl/sm/sn/sp/st/zw and High German schl/schm/schn/schp/scht/schw (Weijnen 1966). 4.2 Measuring divergence from the proto- The Limburg dialects are also characterized by the languages uvular [z] while most Dutch dialects have the al- The Levenshtein distance enables us to compare veolar [q]. All of this shows that Limburg conso- each of the 360 Dutch dialects to PLR and PGD. nants are innovative. Since we reconstructed 85 words, the distance be- The map based on vowel substitutions shows tween a dialect and a proto-language is equal to the that Frisian vowels are not particularly close to average of the distances of 85 word pairs. PGD. Frisian is influenced by the Ingvaeonic Figures 1 and 2 show the distances to PLR and sound shift. Among other changes, the [X] changed PGD respectively. Dialects with a small distance are represented by a lighter color and those with a into [H], which in turn changed into [D] in some large distance by a darker color. In the map, dia- cases (Dutch dun ‘thin’ is Frisian tin) (Van Bree 8 lects are represented by polygons, geographic dia- 1987, p. 69). Besides, Frisian is characterized by lect islands are represented by colored dots, and its falling diphthongs, which are an innovation as linguistic dialect islands are represented by dia- well. When we consulted the map based on conso- monds. The darker a polygon, dot or diamond, the nant substitutions, we found the Frisian consonants greater the distance to the proto-language. close to PGD. For example the initial /g/ is still The two maps show similar patterns. The dia- pronounced as a plosive as in most other Germanic lects in the Northwest (Friesland), the West varieties, but in Dutch dialects – and in standard (Noord-Holland, Zuid-Holland, Utrecht) and in the Dutch – as a fricative. middle (Noord-Brabant) are relatively close to the When we consider West-Flemish, we find the proto-languages. More distant are dialects in the vowels closer to PGD than the consonants, but Northeast (Groningen, Drenthe, Overijssel), in the they are still relatively distant to PGD. Southeast (Limburg), close to the middle part of 4.3 PLR versus PGD the Flemish/Walloon border (Brabant) and in the southwest close to the Belgian/French state border When correlating the 360 dialect distances to PLR (West-Vlaanderen). with the 360 dialect distances to PGD, we obtained According to Weijnen (1966), the Frisian, Lim- a correlation of r=0.87 (p<0.0001)9. This is a sig- burg and West-Flemish dialects are conservative. nificant, but not perfect correlation. Therefore we Our maps shows that Frisian is relatively close to compared the word transcriptions of PLR with proto-Germanic, but Limburg and West-Flemish those of PGD. are relatively distant. We therefore created two maps, one which shows distances to PGD based on vowel substitutions in stressed syllables only, and another showing distances to PGD on the basis of consonant substitutions only.7 Looking at the map based on vowel substitutions we find the vowels of the Dutch province of Lim- 8 burg and the eastern part of the province Noord- The Ingvaeonic sound shift affected mainly Frisian Brabant relatively close to PGD. Looking at the and English, and to a lesser degree Dutch. We mention here the phenomenon found in our data map based on consonant substitutions we find the most frequently. consonants of the Limburg varieties distant to 9 For finding the p-values we used with thanks: VassarStats: Website for Statistical Computation at: 7 The maps are not included in this paper. http://faculty.vassar.edu/lowry/VassarStats.html.

Figure 1. Distances of 360 Dutch dialects com- Figure 2. Distances of 360 Dutch dialects compared to PLR. Dialects are represented by poly- pared to PGD. Dialects are represented by polygons, geographic dialect islands are represented by gons, geographic dialect islands are represented by colored dots, and linguistic dialect islands are rep- colored dots, and linguistic dialect islands are represented by diamonds. Lighter polygons, dots or resented by diamonds. Lighter polygons, dots or diamonds represent more conservative dialects and diamonds represent more conservative dialects and darker ones more innovative dialects. darker ones more innovative dialects.

First we focus on the reconstruction of vowels. Looking at the consonants, we found 44 words We find 28 words for which the reconstructed which have the same consonants as in PGD.11 For vowel of the stressed syllable was the same as in 36 words only one consonant was different, where PGD10. In 15 cases, this was the result of applying most words have at least two consonants. This the tendencies discussed in Section 2.1. In 13 cases shows that the reconstruction of consonants works this was the result of simply choosing the vowel much better than the reconstruction of vowels. found most frequently among the 360 word pronunciations. When we do not use tendencies, but 5 Conclusions simply always choose the most frequent vowel, we In this paper we tried to reconstruct a ‘proto- obtain a correlation which is significantly lower language’ on the basis of the RND dialect material (r=0.74, p=0). and see how close we come to the protoforms We found 29 words for which vowel was recon- found in Köbler’s proto-Germanic dictionary. We structed different from the one in PGD, although reconstructed the same vowel as in PGD or the the PGD vowel was found among at least two dia- closest possible vowel for 46% of the words. lects. For 28 words the vowel in the PGD form was Therefore, the reconstruction of vowels still needs not found among the 360 dialects, or only one to be improved further. time. For 11 of these words, the closest vowel The reconstructions of consonants worked well. found in the inventory of that word, was recon- For 52% of the words all consonants reconstructed structed. For example the vowel in ook ‘too’ is are the same as in PGD. For 42% of the words, [@t] in PGD, while we reconstructed [åt]. only one consonant was differently reconstructed. And, as a second goal, we measured the divergence of Dutch dialects compared to their proto-

10 For some words PGD gives multiple pronunciations. 11 When PGD has multiple pronunciations, we count the We count the number of words which has the same number of words for which the consonants are the vowel in at least one of the PGD pronunciations. same as in at least one of the PGD pronunciations.

38 language. We calculated dialect distances to PLR thesis, Rijksuniversiteit Groningen, Groningen. Avai- and PGD, and found a correlation of r=0.87 be- lable at: http://www.let.rug.nl/~heeringa/dialectology tween the PLR distances and PGD distances. The /thesis. high correlation shows the relative influence of Hans Henrich Hock & Brian D. Joseph. 1996. wrongly reconstructed sounds. Language History, Language Change, and Language When we compared dialects to PLR and PGD, Relationship: an Introduction to Historical and we found especially Frisian close to proto- Comparative Linguistics. Mouton de Gruyter, Berlin Germanic. When we distinguished between vowels etc. and consonants, it appeared that southeastern dia- Brett Kessler. 1995. Computational dialectology in Irish lects (Dutch Limburg and the eastern part of Gaelic. Proceedings of the 7th Conference of the Noord-Brabant) have vowels close to proto- European Chapter of the Association for Computa- Germanic. Frisian is relatively close to proto- tional Linguistics, 60-67. EACL, Dublin. Germanic because of its consonants. Gerhard Köbler. 2003. Neuhochdeutsch-germanisches Wörterbuch. Available at: http:// Acknowledgements www.koeblergerhard.de/germwbhinw.html. We thank Peter Kleiweg for letting us use the pro- Winfred P. Lehmann. 1994. Ghotic and the Reconstruc- grams which he developed for the representation of tion of Proto-Germanic. In: Ekkehard König & Johan the maps. We would like to thank Prof. Gerhard van der Auwera, eds. The Germanic Languages, 19- Köbler for the use of his neuhochdeutsch- 37. Routledge, London & New York. germanisches Wörterbuch and his explanation Winfred P. Lehmann. 2005-2007. A Grammar of Proto- about this dictionary and Gary Taylor for his ex- Germanic. Online books edited by Jonathan Slocum. planation about proto-Germanic pronunciation. We Available at: http://www.utexas.edu/cola/centers/lrc/ also thank the members of the Groningen Dialec- books/pgmc00.html. tometry group for useful comments on a earlier Adolphe C. H. van Loey. 1967. Inleiding tot de histori- version of this paper. We are grateful to the sche klankleer van het Nederlands. N.V. W.J. Thie- anonymous reviewers for their valuable sugges- me & Cie, Zutphen. tions. This research was carried out within the John Nerbonne & Wilbert Heeringa & Erik van den framework of a talentgrant project, which is sup- Hout & Peter van der Kooi & Simone Otten & Wil- ported by a fellowship (number S 30-624) lem van de Vis. 1996. Phonetic Distance between from the Netherlands Organisation of Scientific Dutch Dialects. In: Gert Durieux & Walter Daele- Research (NWO). mans & Steven Gillis, eds. CLIN VI, Papers from the sixth CLIN meeting, 185-202. University of Antwerp, References Center for Dutch Language and Speech, Antwerpen. Edgar Blancquaert & Willem Pée, eds. 1925-1982. Arend Quak & Johannes Martinus van der Horst. 2002. Reeks Nederlandse Dialectatlassen. De Sikkel, Ant- Inleiding Oudnederlands. Leuven University Press, werpen. Leuven. Paul Boersma & David Weenink 2005. Praat: doing Jan Stroop. 1998. Poldernederlands; Waardoor het phonetics bycomputer. Computer program retrieved ABN verdwijnt, Bakker, Amsterdam. from http://www.praat.org. Henry Sweet. 1888. A History of English Sounds from Cor van Bree. 1987. Historische Grammatica van het the Earliest Period. Clarendon Press, Oxford. Nederlands. Foris Publications, Dordrecht. Marijke van der Wal together with Cor van Bree. 1994. Geschiedenis van het Nederlands. Aula-boeken. Het Cor van Bree. 1996. Historische Taalkunde. Acco, Leu- nd ven. Spectrum, Utrecht, 2 edition. Antonius A. Weijnen. 1966. Nederlandse dialectkunde. Wilbert Heeringa. 2001. De selectie en digitalisatie van nd dialecten en woorden uit de Reeks Nederlandse Dia- Studia Theodisca. Van Gorcum, Assen, 2 edition. lectatlassen. TABU: Bulletin voor taalwetenschap, 31(1/2):61-103. Wilbert Heeringa. 2004. Measuring Dialect Pronuncia- tion Differences using Levenshtein Distance. PhD