Chapter 8 Measuring Norwegian Dialect Distances

Chapter 8 Measuring Norwegian Dialect Distances

Chapter 8 Measuring Norwegian dialect distances In Chapter 7 a range of computational comparison methods was validated. The method with the highest score is a variant of the Levenshtein distance, where (i) segment distances are found on the basis of the Barkfilter representation, (ii) four length gradations are used, (iii) diphthongs are represented as a sequence of two segments, and (iv) logarithmic segment distances are used (Section 7.5.1). This method was applied to a small set of 15 Norwegian dialects (Section 7.5.2). In this chapter we apply the same method to a larger set of 55 Norwegian varieties. Results will be compared to the dialect map of Skjekkeland (1997). In Section 8.1 the set of 55 varieties will be discussed. On the basis of the Levenshtein distances we will perform cluster analysis and multidimensional scal- ing. In Section 8.2 results of cluster analysis are presented, and in Section 8.3 the results of multidimensional scaling. The discussion of the results should be considered as an initial impetus. Further analysis of the results may be useful future work. In Section 8.4 we draw some conclusions. 8.1 Data source In Section 7.2 we described a database which contains recordings of different Norwegian varieties. The database was compiled by Jørn Almberg and Kristian Skarbø. For each variety a recording and a transcription is given of the fable `The North Wind and the Sun'. The text consists of 58 words which are given in Appendix B Table B.1. When the perception experiment was carried out (see Section 7.4.1), record- ings of only 15 varieties were available. Later on this database was extended. In this chapter results are presented which are obtained on the basis of a set of 55 varieties. Figure 8.1 shows the geographical distribution of the dialects. The set of 55 varieties covers all nine dialect areas as found on the map of Skjekkeland 199 200 CHAPTER 8. MEASURING NORWEGIAN DIALECT DISTANCES (1997). Figure 8.2 shows the distribution of the varieties over the dialect areas as given by Skjekkeland. For some locations more than one recording and tran- scription was available. Therefore, these locations are numbered in the figures in this chapter.1 Alstahaug • The two versions are based on different recordings of different informants, the first from Sandnessjøen (Alstahaug 1) and the second from Tjøtta (Als- tahaug 2). The first version is most representative for the area of Alstahaug. Bergen • The two versions are based on different recordings of the same informant. The older version (Bergen 1) is no longer available on the web, but was used in validation work (see Section 7.2). The newer version (Bergen 2) is the better one according to the speaker. Bodø • The two versions are based on different recordings of the same informants. The older version (Bodø 1) is no longer available on the web, but was used in validation work. The newer version (Bodø 2) is the better one according to the speaker. Rana • The two versions are based on different recordings by different informants, both from Rana (Rana 1 and Rana 2). The second version is more repres- entative for the area around Rana. Stavanger • The two versions are based on different recordings by different inform- ants, the first from Hafrsfjord (Stavanger 1) and the second from Hundv˚ag (Stavanger 2). Both are equally representative for the surrounding of Stavanger, but when we are forced to make a choice, we select the second version. Stjørdal • The three versions are based on different recordings of different informants, the first and the second from Stjørdal (Stjørdal 1, Stjørdal 2), and the third from Stjørdalshalsen (Stjørdal 3). The first version is most representative. In validation work the second version is used, which was available earlier. 1We are grateful to Jørn Almberg (personal communication) for advice at several points below, e.g., the question as to which of two versions is the more typical for a given site. 8.2. CLASSIFICATION 201 Time • The two versions are based on different recordings of different informants, the first from Bryne (Time 1) and the second from Undheim (Time 2). The first version is most representative for the area of Time. This version was also used in validation work. 8.2 Classification 8.2.1 Cluster analysis Using the Levenshtein variant we mentioned at the beginning of this chapter, we calculated the distances between the 55 varieties. On the basis of these distances we applied cluster analysis (see Section 6.1). In Figure 8.3 a dendrogram is given, showing the classification of 55 Norwegian varieties. In the dendrogram, the scale distance shows percentages. Examining the nine most significant groups we find from upper to lower the dialects of Herøy and Fræna, a central group, the dialect of Bø, an eastern group, a southeastern group, a northern group, a western group and a southwestern group. The same groups are geographically visualized in Figure 8.4. When regarding only the 5 most significant groups, Herøy and Fræna appear to be one cluster. Both varieties belong to the Nordvestlandsk varieties. However they are not clustered with the other Nordvestlandsk varieties, which are found in the western group. When considering the 9 most significant groups, each of these two varieties appears to be a separate dialect, not clustered with any of the other groups. This indicates that the two varieties are very marked dialects among the other Nordvestlandsk varieties. The central group contains the Trøndsk varieties of Sunndal and Oppdal and the geographically rather close Midlandsk variety of Lesja. It is striking that Sunndal and Oppdal are not clustered with the other Trøndsk varieties, which are for the greater part found in the northern group. We expected that Lesja would be clustered with the other Midlandsk variety of Bø. However, just as for the set of 15 varieties, this is not the case. Geographically the two varieties are distant. The variety of Bø appears to be a separate variety which does not belong to any of the other varieties. It is striking that Bø is suggested to be closest to the eastern varieties, and not with the geographically closer varieties of the southeastern group. The Austlandsk varieties are divided into an eastern group and a southeastern group. In the southeastern group the geographically adjacent Sørlandsk varieties are found as well. More striking is the presence of the Sørvestlandsk variety of Bergen and the Trøndsk variety of Trondheim in this group. We cannot explain this. However, it is not uncommon that varieties of larger cities are dialect islands 202 CHAPTER 8. MEASURING NORWEGIAN DIALECT DISTANCES Sør−Varanger Nordreisa Tromsø Torsken Harstad Sortland Vestvågøy Bodø Rana Alstahaug Brønnøy Bjugn Verdal Stjørdal Orkdal Fræna Trondheim Eide Skodje Molde Oppdal Herøy Sunndal Tynset Ørsta Lesja Gaular Rendalen Gulen Trysil Lillehammer Vaksdal Nordre Land Stange Bergen Voss Oslo Bø Borre Stavanger Fyresdal Time Larvik Halden Arendal Mandal Kristiansand Figure 8.1: The geographic distribution of the 55 Norwegian varieties. 8.2. CLASSIFICATION 203 TF TF TF TF TF No No No He He He Tr Tr Tr Nv Tr Tr Tr Nv Tr Nv Nv No = Nordlandsk Tr Au Nv He = Helgelandsk Mi Nv Au TF = Troms−Finnmarks−mål Nv Sv = Sørvestlandsk Au Au Sv Au Nv = Nordvestlandsk Au Sv Sv Sø = Sørlandsk Au Mi = Midlandsk Mi Au Au = Austlandsk Sv Sø Tr = Trøndsk Sv Au Au Sø Sø Sø Figure 8.2: According to Skjekkeland (1997) the Norwegian language area can be divided in nine groups. The data points on this map correspond with those in Figure 8.1. In the set of 55 varieties all dialect areas are represented. The same abbreviations are used in the other figures in this chapter. 204 CHAPTER 8. MEASURING NORWEGIAN DIALECT DISTANCES which are related to geographically more remote dialects.2 Rather unexpected is that the varieties of Tynset and Lillehammer are in the southeastern group, and not in the eastern group. Lillehammer is closest to Halden. Both Halden and especially Lillehammer are nearly standard (i.e., close to bokm˚al), and therefore not very typical dialect versions from their respective geographic regions. The reading of Tynset is also quite standard, which may be the reason why it is judged to be closer to Oslo. In the southeastern group two versions of Bergen, which are recorded by the same informant, do not form one cluster, but are rather close. The largest group in the dendrogram is the northern group. It contains Nord- landsk, Helgelandsk, Troms-Finnmarks-m˚aland Trøndsk. The group may be di- vided in a Trøndsk group on the one hand, and a group containing the other varieties on the other hand. In the latter group, no systematic division between Nordlandsk, Helgelandsk and Troms-Finnmarks-m˚alvarieties can be found. Per- haps the division in these three areas has become blurred over time. The two varieties of Rana are rather close, although they do not form one cluster. The varieties of Alstahaug are obviously more distant, indicating dialect diversity in a small area. Stjørdal 1 and 2 are rather close. Compared to these two varieties Stjørdal 3 is relatively distant, indicating again strong variation in a small area. The two versions of Bodøy are recorded by the same person. They neatly form one cluster. In the western group Nordvestlandsk varieties are mainly found. The adjoin- ing Sørvestlandsk varieties of Vaksdal and Voss are in this group as well. More surprisingly is that the Sørlandsk dialect of Fyresdal is also in this group. It would be more fitting if this dialect were clustered with other Sørlandsk variet- ies.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    14 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us