arXiv:cmp-lg/9503002v1 1 Mar 1995 n eiwrfrhlflcomments. helpful for reviewer a and sue o h aecnet rprastesame the isoglosses perhaps But or phoneme. same concept, the same for the pronunciation word for same used plot the to is where been regions has delineating ill-defined approach isoglosses, and traditional time-consuming The a process. be dialect can Unfortunately, definition planners. language educa- and re- broadcasters, tors, allocate publishers, for learners, one implications has language helps and research language lin- areas in that sources dialect language. tasks a the first approaching Knowing when the pursue of to need one guists is dialects Defining Introduction 1 hn atnKy alKprk,TmWasow, Tom Kiparsky, Paul Kay, Martin thank I ∗ pnigt ainlad(ihnIreland) (within boundaries. and provincial national to corre- sponding obtained, reason- are Gaelic, boundaries dialect to able applied is compar- distances string ison phonetic of agglomer- clustering When parti- ative of medoids. technique around top-down tioning the than bet- ter works clustering tradi- agglomerative step, clustering tional actual the entries In fa- vocabulary match. whether more dis- on Hamming based the computing tance than of accurate isoglosses, metric miliar more counting is and technique and laborious determining closely more of much correlates the This with strings. phonetic between be distance can Levenshtein sites, as of computed pair linguistic each between of in distance computation step the first The analysis, the atlas. linguistic those a as in such found transcriptions analy- phonetic of cluster sis by automatically objec- and discovered tively be can groupings Dialect opttoa ilcooyi rs Gaelic Irish in dialectology Computational Abstract tnodC 40-10USA 94305-2150 CA Stanford eateto Linguistics of Department [email protected] tnodUniversity Stanford rt Kessler Brett esos hyrpaeiolse ihadistance a with di- isoglosses multiple replace along They variation mensions. presenting summa- for and techniques these rizing several addressed developing has by 1973), issues S´eguy (1971, by ill- duced an is dialect, of boundary, notion concept. dialect very defined the the that therefore conclude and to Paris led others very conundrums and with Such major villages patterns. draw speech two similar to between arbitrary boundaries seems dialect it of north- still versus ends French), Italian the ern southern considers as differ- one (such continua when such the great of be neighbours. effects may its cumulative ences of the speech though the commu- Even one from of little speech differs the nity that such continua, alect sites. of is sets isogloss different compared the on meaningfully So based isoglosses be word. with if cannot that question and use that not incomplete ask does to partic- site meaningless a the is of it consonant word, vari- first ular how the pronounce comparing is sites When question ous the applicable. or not in- lacking, simply are be but some may line, for information a importantly, sites of More sides do two haphazardly. variants on termixed Often up line area. neatly language not the bisect neatly dimension. one how to to reduction as such dialecto- guidance perform Traditional to little gives dialect. methodology site unique logical specific a a in ge- that placed require- vary require be the impose may usually but language we dimensions, ments is, many That in ographically area. de- dialect other, of each the divisions across binary contradictory cut completely may bundles; scribing vague isoglosses forming worst, fea- at other, isoglosses different each for that approach isoglosses tures best, is At 1889:49), Paris coincide. Gaston rarely Durand, as (apud problem, first noted The frustrating. are oercnl,tefil fdaetmty sintro- as dialectometry, of field the recently, More di- have languages most that is problem third The not do isoglosses many that is problem second A ∗ matrix, which compares each site directly with all two-way partitions of the dialect area is O(2N ). other sites, ultimately yielding a single figure that The current state of dialectometry thus presents measures the linguistic distances between each pair two main questions which constitute the method- of sites. There is however no firm agreement on ological focus of this paper. The first deals with just how to compute the distance matrices. S´eguy’s distance matrices. Is there a way to build accu- earliest work (1971) was based on lexical correspon- rate distance matrices that minimize editorial deci- dences: sites differed in the extent to which they sions without discarding relevant data? My research used different words for the same concept. S´eguy suggests that this may be done by using string dis- (1973), Philps (1987), and Durand (1989) use some tances computed directly on phonetic transcriptions, combination of lexical, phonological, and morpho- and that this is better than restricting the study to logical data. Babitch (1988) described the dialec- lexical comparisons. The second deals with cluster- tal distances in Acadian villages by the degree to ing. Do bottom-up or top-down techniques work which their fishing terminology varied. Babitch and best? My conclusion is that the traditional bottom- Lebrun (1989) did a similar analysis based on the up technique works better than a typical top-down varying pronunciation of /r/. Elsie (1986) grouped method. These conclusions are based partly on an the Gaelic dialects on the basis of whether the vocab- analysis of the mathematical properties of the clus- ulary matched. Ebobisse (1989) grouped the Sawa- ters themselves, partly on how well they correlate bantu languages of Cameroon by whether phonolog- with analyses based on more traditional isogloss ical correspondences in matching vocabulary items techniques, and partly on how well they compare were complete, partial, or lacking. There seems to with previously-published descriptions of dialects in be a certain bias in favour of working with lexical a specific language, Irish Gaelic. correspondences, which is understandable, since de- At one time the Gaelic language group was spo- ciding whether two sites use the same word for the ken throughout , from where it spread to the same concept is perhaps one of the easiest linguis- Isle of Man and to much of Scotland. Currently tic judgements to make. The need to figure out such fully native use of Gaelic is limited to a few dis- systems as the comparative phonology of various lin- contiguous areas in the westernmost reaches of Ire- guistic sites can be very time-consuming and fraught land and Scotland. In the case of Ireland, everyone with arbitrary choices. agrees that Gaelic is nowadays found in three main Not all dialectometrists agree on the wisdom of dialects: that of , that of , and that delineating dialect areas. S´eguy (1973:18) insisted of Munster (O´ Siadhail, 1989). But several ques- that the concept of dialect boundaries was mean- tions are raised that are less easily answered. Do the ingless, and his emphasis on the gradience of lan- three provinces separate out so neatly for intrinsic guage similarity has been widely maintained. But linguistic reasons, or simply because their speakers those who do look for firm dialect affiliations (such have become so widely separated from each other as Babitch and Ebobisse) use bottom-up agglomer- geographically as speakers in intervening areas have ative techniques. The two linguistically closest sites adopted English? Does the language of Connacht are grouped into one dialect, and thenceforth treated naturally group with that of Ulster or with that of as a unit. The process continues recursively until all Munster? And looking beyond Ireland, many have sites are grouped into one superdialect embracing commented that the language of Ulster in general is the entire language area under consideration. This similar to that of Scotland. Are Irish, Manx, and yields a binary tree. But Kaufman and Rousseeuw Scottish Gaelic considered three separate languages (1990:44) suggest that when the emphasis in a clus- for intrinsic linguistic reasons, or because they are tering problem is on the top-level clusters—here, spoken in different countries? To a large extent, di- finding the two main dialects—then such bottom-up alectologists have found these questions difficult to methods, which can potentially introduce error at answer because they accepted Paris’s conundrum. each of several steps, are less reliable than top-down For O´ Siadhail, the ultimate scientific justification partitioning methods. Perhaps past researchers have in adopting the three-dialect account is the fact that used inferior bottom-up techniques simply because the Gaeltacht (Irish-speaking territory) is so frag- the necessary algorithms are computationally more mented nowadays that it no longer forms a contin- tractable. Comparing all possible pairs of sites is ´ 2 1 uum. O Cu´ıv(1951:4–49) felt that there can be a O(N ) problem, whereas considering all possible no dialect boundaries because transitions are grad- 1The overall algorithm is O(N 3) since for each new ual. Elsie (1986:240) considers a dialect to be an group one must compute the distances between it and area where all communities are linguistically more each of the other sites or groups. similar to each other than any community is to any site outside the dialect. Such notions provide a very whenever two sites were in the same set and 1 when- firm, absolute notion of dialecthood: a set of com- ever two sites were in contrasting sets, then tak- munities either constitutes a dialect area, or it does ing the mean. This baseline approach corresponds not. But as the dialectometrists have shown, other formally to determining distance by the number of notions of clustering are equally scientific and may isoglosses that separate sites, which is in principle more accurately correspond to intuitive notions of the traditional technique. what it means to be a dialect. This baseline was compared to several other ap- proaches. The etymon identity metric averaged the 2 Data number of times the sites agreed in using words whose stem had the same ultimate derivation. For The data for my study were taken from Wagner example, the dialects differed as to whether they 1958. Wagner administered a questionnaire to na- 2 used some form of bull- or damh- for the word ‘bul- tive speakers of Irish Gaelic in 86 sites. Most of the lock’. Etymon identity is one of the more common informants were over seventy years old and had not approaches in dialectometry; Elsie for example used spoken Irish since their youth. The atlas is therefore it in his study of the Gaelic dialects (1986). Closely an approximate reconstruction of the linguistic land- related is the idea of word identity, where the words scape of the turn of the century, when the Gaeltacht are not counted the same unless all of their mor- was more continuous. Wagner also presents mate- phemes agree. In this analysis, sites that used some rial from the Isle of Man and seven sites in Scot- form of the word bull´an, with the suffix -´an, were land. The mapped entries are presented in a very distinguished from those using the suffix -´og. narrow phonetic transcription based on the Interna- Another set of approaches for computing distance tional Phonetic Alphabet. was based on the phonetics. This computed the Volume 1 of Wagner 1958 consists of 300 maps, Levenshtein distance between phonetic strings. The plotting about 370 concepts. I used the first 51 con- Levenshtein distance is the cost of the least expen- cepts, or about 4500 different string tokens, as part sive set of insertions, deletions, or substitutions that of an ongoing project to enter all of the atlas into would be needed to transform one string into the machine readable format. These 51 concepts were other (Sankoff and Kruskal, 1983). The simplest represented by 312 different Gaelic words or phrases, technique used was phone string comparison. In this whose stems derived from 171 different etymons. approach, all operations cost 1 unit. Thus in com- paring the forms [al:i] and [ali] for eallaigh ‘cattle’, 3 Methodology and results the (minimal) distance was 2, for the substitutions 3.1 Distance matrices [a]/[a] and [l:]/[l]. (For this measure, diacritics such as the length mark ‘:’ were counted as part of the To form a baseline for comparison, I analysed the letter, and different diacritics were adjudged to make distribution of each of the 51 plotted concepts, find- for different letters.) A pair of unrelated words like ing a total of 3,337 features by which two or more [al:i] and [khruh] (for crodh, another word for ‘cat- sites differed. For example, for the concept ‘sell’, tle’) would get a much larger score, 5. I identified two sets, one using the word d´ıol (most In the above technique, very small phonetic differ- sites in Ireland), and one using the word creic (Rath- ences, such as that between a moderately palatalized lin Island, the Isle of Man, and Scotland). The di- and a very palatalized [t], count the same as major alects partitioned in a different way according to how differences, such as that between a [t] and an [e]. It much stress they placed on the verb relative to the would seem to be more accurate to assign a greater pronoun in ‘I sold’ (even stress in Dunlewy and four distance to substitutions involving greater phonetic of the Scottish sites, else extra stress on the verb). distinctions. Unfortunately I know of no comprehen- Not all partitions covered the entire map. In this sive study on the differences between phones, at least example, only sites that used the word d´ıol were not for all 277 contrasts made by Wagner. Instead compared on the basis of whether a schwa devel- I distinguished them on the basis of twelve phonetic oped in the sequence [i:l]. In some cases the divi- features that systematically account for all of the sions were more than two-way: for example, Wag- distinctions in Wagner’s inventory: nasality, stric- ner distinguishes whether the final consonant in creic ture, laterality, articulator, glottis, place, palataliza- is unpalatalized, palatalized, or slightly palatalized. tion, rounding, length, height, strength, and syllab- Distance between sites was determined by counting 0 icity. The features were given discrete ordinal values 2The atlas also maps for Kilkenny some information scaled between 0 and 1, the exact values being ar- gathered from another source. bitrary. For example, place took on the values glot- tal=0, uvular=0.1, postvelar=0.2, velar=0.3, preve- lar=0.4, palatal=0.5, alveolar=0.7, dental=0.8, and Table 1: Correlation of distance matrices to the isogloss distance matrix labial=1. The distance between any two phones was ρ K judged to be the difference between the feature val- c Phone string comparison .95 .76 ues, averaged across all twelve features. These dis- Feature string comparison tances were used instead of uniform 1-unit costs in —— all-word .92 .70 computing Levenshtein distance. The resulting met- —— same-word .91 .69 ric was called feature string comparison. Etymon identity .85 .61 It could be argued that it is meaningless to com- Word identity .84 .63 pare the phonetics of different words, as in the case of eallaigh vs. crodh mentioned above. Therefore the feature string comparison was also computed only it is for the one phone to turn into the other in the for pairs of citations that used the same word, so course of normal language change. In the method al l al that [ :i] vs. [a i] would be compared, but [ :i] vs. described here, [s] is adjudged closer to [g] than to [khruh] would be ignored. The different approaches [h]. But [s] often changes into [h] in the world’s lan- are called all-word vs. same-word feature string com- guages, and so the pair should have a small distance; parisons. whereas the change of [s] to [g] has never occurred to All of these distance matrices were compared with my knowledge, and so should have a very large dis- the isogloss matrix, to see which of them gives re- tance. The unfortunate fact that such ideal data are sults closest to that base method. I used two differ- lacking is compensated for by the fact that the in- ent methods of comparison, Pearson’s ρ computed expensive phone-string comparison employed in this between all corresponding cells in the two matrices, study performs quite well. and 3.2 Clustering techniques K = Average(sign((X − X )(Y − Y )) c ij ik ij ik The traditional agglomerative technique for cluster- which is a derivative of Kendall’s τ that Dietz (1983) ing has been described above. There is some vari- empirically found particularly accurate as a test ation in how the distance between two clusters is statistic for comparing distance matrices.3 Table 1 measured. For this study I used the average distance shows that the two measures give parallel results. between all pairs of elements that are in different More importantly, it shows that the approaches clusters. I compared agglomeration to a top-down based on string comparisons of the phonetic tran- method that Kaufman and Rousseeuw (1990) call partitioning around medoids. This model reduces scriptions correlate more highly with the isogloss N approach than do the word or etymon identity mea- the O(2 ) intractability of top-down approaches dis- sures. Furthermore, comparing whole phone letters cussed above by dramatically reducing the number works better than the more sophisticated technique of binary partitions that are considered. Here one of comparing features, and restricting comparison seeks to divide the sites into two groups by finding to pairs based on the same words does not make the the two representative sites (the medoids) around latter any better. which all the other sites cluster in such a way as to give the least average distance between the sites and Of course, I do not expect that this technique 3 using flat 1-unit costs will prove superior to all their representatives. This is therefore a O(N ) al- methods that are more sensitive to phonetic de- gorithm, comparable in efficiency to agglomeration. tails. Feature comparison may work better if fea- The process was repeated recursively on each dialect. tures were weighted differentially, or if the numeric One way of measuring how well a binary cluster- values they assume were assigned less arbitrarily, or ing technique works for dialect grouping is to com- if the Manhattan-style distance computation were pare for each site i its average dissimilarity from the replaced by some formula that did not assume that other sites in the same dialect, a(i), with its aver- the features are independent of each other. An ideal age dissimilarity from the sites in the other dialect, comparison would be based on data telling how likely b(i). Kaufman and Rousseeuw (1990:83–86) define the statistic s(i) to be 1 − a(i)/b(i) if a(i) is less 3 That is, for each site i, one considers all other pairs of than b(i), otherwise b(i)/a(i) − 1. The statistic thus sites, j and k, and asks whether the linguistic difference ranges from 1 (perfect fit) to −1 (site i would per- between i and j is greater or less than that between i and k. One counts 1 if the answer is the same for both fectly fit in the other group). Plotting this statistic distance matrices, −1 if it is different. Kc is the average gives a silhouette by which the eye can judge how of these numbers. well classified each site is. Averaging this statistic Table 2: Statistics ¯ for the top-level binary dialect division, comparing partitioning around medoids and agglomeration for the different distance matri- **** Kilkenny, Kilkenny, , Ireland **** Lough Attorick, Galway, Connacht, Ireland ces. *** Doolin, Clare, Munster, Ireland Part. Aggl. *** Fanore, Clare, Munster, Ireland *** Clear Island, Cork, Munster, Ireland Isoglosses 0.287 0.345 *** Skibbereen, Cork, Munster, Ireland *** Kinvra, Galway, Connacht, Ireland Phone string comparison 0.185 0.322 *** Coomhola, Cork, Munster, Ireland Feature string comparison *** Cloghane, Kerry, Munster, Ireland *** Kilgarvan, Kerry, Munster, Ireland —— all-word 0.252 0.353 *** Waterville, Kerry, Munster, Ireland *** Killorglin, Kerry, Munster, Ireland ——same-word 0.219 0.401 *** Glandore, Cork, Munster, Ireland Etymon identity 0.363 0.478 *** Carraroe, Galway, Connacht, Ireland *** Dursey Sound, Cork, Munster, Ireland Word identity 0.370 0.309 *** Careeny, Galway, Connacht, Ireland *** Ballymacoda, Cork, Munster, Ireland *** Newbridge, Galway, Connacht, Ireland *** Kilsheelan, Waterford, Munster, Ireland *** Conakilty, Cork, Munster, Ireland across all sites gives an idea of how felicitous the *** Craughwell, Galway, Connacht, Ireland *** Coolea, Cork, Munster, Ireland overall clustering is,s ¯. *** Rosmuck, Galway, Connacht, Ireland *** Glenflesk, Kerry, Munster, Ireland Figures 1–2 present the silhouette for clustering *** Mount Melleray, Waterford, Munster, Ireland the isogloss distance matrix by partitioning. This *** Cornamona, Galway, Connacht, Ireland *** Dunquin, Kerry, Munster, Ireland analysis produces a large dialect which groups to- *** Tourmakeady, Mayo, Connacht, Ireland *** Ring, Waterford, Munster, Ireland gether the sites in Munster, Scotland, the Isle of *** Laughanbeg, Galway, Connacht, Ireland *** Emlaghmore, Galway, Connacht, Ireland Man, and almost all sites in Connacht, as well as ** Lauragh, Kerry, Munster, Ireland Rathlin Island in Ulster; and another which groups ** Kilbaha, Clare, Munster, Ireland ** Moycullen, Galway, Connacht, Ireland together all the other sites in Ulster, as well as ** Sliabh gCua, Waterford, Munster, Ireland ** Kilmovee, Mayo, Connacht, Ireland County in Connacht. Although the Ulster ** Goatenbridge, Tipperary, Munster, Ireland group is fairly tight, with ans ¯ of 0.41, the other ** Colmanstown, Galway, Connacht, Ireland ** Inisheer, Galway, Connacht, Ireland group has a more anemics ¯ of 0.25, with the sites ** Glentrasna, Galway, Connacht, Ireland ** Letterfrack, Galway, Connacht, Ireland outside of Munster and Southern Connacht being in- ** Angliham, Galway, Connacht, Ireland ** Annaghdown, Galway, Connacht, Ireland differently classified. The weighteds ¯ for both groups ** Lough Nafooey, Galway, Connacht, Ireland comes out at 0.29. By comparison, Figures 3–4 show ** Sonnagh, Galway, Connacht, Ireland ** C´arna, Galway, Connacht, Ireland what happens when the traditional agglomerative ** Louisburgh, Mayo, Connacht, Ireland ** Inishmaan, Galway, Connacht, Ireland technique is used. The dialects of Scotland and the ** Camderry, Galway, Connacht, Ireland Isle of Man form a cluster with a great deal of inter- ** Ceathr´uan Tairbh, Roscommon, Connacht, Ire. ** Carnmore, Galway, Connacht, Ireland nal diversity (¯s = 0.12), and all the sites in Ireland ** Ballycastle, Mayo, Connacht, Ireland ** Cashel, Galway, Connacht, Ireland form another cluster averagings ¯ = 0.37, with only ** Tobercurry, , Connacht, Ireland ** Ballyglunin, Galway, Connacht, Ireland Rathlin Island being indifferently classified. The ** Aclare, Sligo, Connacht, Ireland weighted average is 0.35, which is superior to that ** Belderg, Mayo, Connacht, Ireland * Dohooma, Mayo, Connacht, Ireland of the partitioning technique. * Portacloy, Mayo, Connacht, Ireland * Blacksod, Mayo, Connacht, Ireland The same comparative results obtain for almost * Kintyre, Argyll, Scotland Isle of Man all of the distance measuring techniques. Table 2 Slievenakilla, Leitrim, Connacht, Ireland shows that thes ¯ for the first binary split is usually Inveraray, Argyll, Scotland Arran, Bute, Scotland appreciably higher for agglomeration than it is for Curraun Peninsula, Mayo, Connacht, Ireland Achill, Mayo, Connacht, Ireland partitioning. This result suggests not any inferiority Lochalsh, Ross and Cromarty, Scotland of top-down techniques in general—applying thes ¯ Assynt, Sutherland, Scotland Carloway, Lewis, Ross and Cromarty, Scotland statistic to all binary partitions would by definition Benbecula, Inverness, Scotland , Sligo, Connacht, Ireland reveal the optimal split—nor of partitioning around Rathlin Island, Antrim, Ulster, Ireland medoids in general. Rather, it appears that the as- sumption behind this partitioning heuristic, that a Figure 1: Silhouette for the first top-level binary site will be closer to the medoid of its own group dialect grouping computed on the isogloss distance than to the medoid of the other group, often fails to matrix via partitioning. Stars represent relative s(i). hold true in dialectology. The lack of clean breaks Locations in Ireland are cited by locality, county, between dialects and the fact that dialects of the province, and country. same language may vary greatly in diameter (i.e., maximal intragroup distances) both mean that the assumption will often be invalid. ***** Kildarragh, Donegal, Ulster, Ireland **** Creeslough, Donegal, Ulster, Ireland **** Colmanstown, Galway, Connacht, Ireland **** Glenvar, Donegal, Ulster, Ireland **** Moycullen, Galway, Connacht, Ireland **** Loughanure, Donegal, Ulster, Ireland **** Ceathr´uan Tairbh, Roscommon, Connacht, Ire. **** Lettermacaward, Donegal, Ulster, Ireland **** Ballyconnell, Sligo, Connacht, Ireland **** Beflaght, Donegal, Ulster, Ireland **** Carnmore, Galway, Connacht, Ireland **** Kingarroo, Donegal, Ulster, Ireland **** Annaghdown, Galway, Connacht, Ireland **** Croaghs, Donegal, Ulster, Ireland **** Lough Nafooey, Galway, Connacht, Ireland **** Aranmore, Donegal, Ulster, Ireland **** Glentrasna, Galway, Connacht, Ireland **** Gortahork, Donegal, Ulster, Ireland **** Ballycastle, Mayo, Connacht, Ireland **** Downings, Donegal, Ulster, Ireland **** Carraroe, Galway, Connacht, Ireland **** Tory Island, Donegal, Ulster, Ireland **** Cornamona, Galway, Connacht, Ireland **** Dunlewy, Donegal, Ulster, Ireland **** Aclare, Sligo, Connacht, Ireland *** Rannafast, Donegal, Ulster, Ireland **** Emlaghmore, Galway, Connacht, Ireland *** Meenacharvy, Donegal, Ulster, Ireland **** Dohooma, Mayo, Connacht, Ireland *** Teelin, Donegal, Ulster, Ireland **** Ballyglunin, Galway, Connacht, Ireland *** Ardara, Donegal, Ulster, Ireland **** Sonnagh, Galway, Connacht, Ireland *** Ballyhooriskey, Donegal, Ulster, Ireland **** Cashel, Galway, Connacht, Ireland *** Creggan, Tyrone, Ulster, Ireland **** Curraun Peninsula, Mayo, Connacht, Ireland *** Clonmany, Donegal, Ulster, Ireland **** Craughwell, Galway, Connacht, Ireland ** Omeath, Louth, Ulster, Ireland **** Belderg, Mayo, Connacht, Ireland * , Cavan, Connacht, Ireland **** Portacloy, Mayo, Connacht, Ireland **** Angliham, Galway, Connacht, Ireland Figure 2: Silhouette for the second dialect grouping **** Blacksod, Mayo, Connacht, Ireland **** Laughanbeg, Galway, Connacht, Ireland computed on the isogloss distance matrix via parti- **** Kinvra, Galway, Connacht, Ireland **** Coomhola, Cork, Munster, Ireland tioning. **** Camderry, Galway, Connacht, Ireland **** Cloghane, Kerry, Munster, Ireland **** Rosmuck, Galway, Connacht, Ireland * Carloway, Lewis, Ross and Cromarty, Scotland **** Newbridge, Galway, Connacht, Ireland * Benbecula, Inverness, Scotland **** Lough Attorick, Galway, Connacht, Ireland * Assynt, Sutherland, Scotland **** Kilmovee, Mayo, Connacht, Ireland * Inveraray, Argyll, Scotland **** Achill, Mayo, Connacht, Ireland * Lochalsh, Ross and Cromarty, Scotland *** Dursey Sound, Cork, Munster, Ireland * Kintyre, Argyll, Scotland *** Tourmakeady, Mayo, Connacht, Ireland * Arran, Bute, Scotland *** Letterfrack, Galway, Connacht, Ireland Isle of Man *** Clear Island, Cork, Munster, Ireland *** Skibbereen, Cork, Munster, Ireland *** Glandore, Cork, Munster, Ireland Figure 3: Silhouette for isogloss dialect grouping us- *** Tobercurry, Sligo, Connacht, Ireland ing agglomerative clustering, first group. *** Glenflesk, Kerry, Munster, Ireland *** Fanore, Clare, Munster, Ireland *** Careeny, Galway, Connacht, Ireland *** Doolin, Clare, Munster, Ireland *** Killorglin, Kerry, Munster, Ireland 3.3 Gaelic dialects *** Dunquin, Kerry, Munster, Ireland *** Louisburgh, Mayo, Connacht, Ireland *** Kilsheelan, Waterford, Munster, Ireland Since agglomeration is the better clustering tech- *** Waterville, Kerry, Munster, Ireland *** Kilgarvan, Kerry, Munster, Ireland nique, the best dialect analysis should be obtained *** C´arna, Galway, Connacht, Ireland by agglomerating the isogloss matrix. The best au- *** Ballymacoda, Cork, Munster, Ireland *** Conakilty, Cork, Munster, Ireland tomated approximation should be agglomerating the *** Kilbaha, Clare, Munster, Ireland *** Coolea, Cork, Munster, Ireland distance matrix computed by phonetic string com- *** Lauragh, Kerry, Munster, Ireland parison, and indeed the top-level topologies pro- *** Glangevlin, Cavan, Connacht, Ireland *** Sliabh gCua, Waterford, Munster, Ireland duced by both techniques are virtually identical. *** Mount Melleray, Waterford, Munster, Ireland *** Kingarroo, Donegal, Ulster, Ireland Both group into one loosely-connected entity all the *** Goatenbridge, Tipperary, Munster, Ireland *** Downings, Donegal, Ulster, Ireland sites in Scotland, and into another all the sites in *** Croaghs, Donegal, Ulster, Ireland Ireland. The isogloss approach groups Manx as a *** Inishmaan, Galway, Connacht, Ireland *** Glenvar, Donegal, Ulster, Ireland cousin of the Scottish dialects, and the phonetic ap- *** Ring, Waterford, Munster, Ireland *** Lettermacaward, Donegal, Ulster, Ireland proach makes it a cousin of the Irish dialects, but *** Kildarragh, Donegal, Ulster, Ireland *** Inisheer, Galway, Connacht, Ireland in both cases the s of Manx is very small (less *** Creeslough, Donegal, Ulster, Ireland than 0.06), making it essentially intermediate be- *** Gortahork, Donegal, Ulster, Ireland *** Beflaght, Donegal, Ulster, Ireland tween the two groups. Thus the popular notion that *** Kilkenny, Kilkenny, Leinster, Ireland ** Ardara, Donegal, Ulster, Ireland Scottish, Irish and Manx Gaelic are distinct entities ** Rannafast, Donegal, Ulster, Ireland is well supported by the analyses. Both analyses ** Aranmore, Donegal, Ulster, Ireland ** Dunlewy, Donegal, Ulster, Ireland group Rathlin Island very weakly with the rest of ** Meenacharvy, Donegal, Ulster, Ireland ** Loughanure, Donegal, Ulster, Ireland Irish, but the s for Rathlin is so low (less than 0.09) ** Slievenakilla, Leitrim, Connacht, Ireland ** Ballyhooriskey, Donegal, Ulster, Ireland that its grouping too is essentially arbitrary. This ** Omeath, Louth, Ulster, Ireland aligns with the fact that authorities disagree as to ** Creggan, Tyrone, Ulster, Ireland ** Clonmany, Donegal, Ulster, Ireland whether it is a dialect of Irish (as does Wagner) or ** Teelin, Donegal, Ulster, Ireland ** Tory Island, Donegal, Ulster, Ireland of Scottish (O’Rahilly 1932:191). Except for Rath- Rathlin Island, Antrim, Ulster, Ireland lin Island, both methods group the Irish sites into one group containing all the sites in Ulster, and an- Figure 4: Silhouette by agglomeration, Irish group. other, Southern, group, which itself breaks into a group containing all the sites in Connacht and one rial work. That phonetic comparison is more pre- containing all the sites in Munster.4 Both meth- cise is not particularly surprising, since etymon iden- ods agree on how the 87 sites are distributed among tity ignores a wealth of phonetic, phonological, and these dialects. This three-way division accords with morphological data, whereas comparing phones has the universal perception that Ulster, Connacht and the side effect of also counting higher-level variation: Munster form the three major dialect groups. The if words differ in morphemes, their phonetic differ- special status of Ulster contradicts the position of ence is going to be high. As for clustering the sites O’Rahilly (1932:18) that Connacht groups with Ul- into dialect areas, the familiar bottom-up agglomer- ster to form a Northern dialect over against Mun- ation method proves superior to top-down partition- ster. But it agrees with Elsie’s finding (1986:255) ing around medoids. that that province is lexicostatistically more remote Of course simply knowing the dialect areas is not from Connacht and Munster than those two are from the last word in dialectology. There remain such es- each other. Furthermore, Hindley reports (1990:63) sential problems as identifying the differing linguis- that speakers of other dialects often switch off radio tic structures that characterize the dialects, and dis- broadcasts in , ‘which is very distinctive’. covering their history. But all of these tasks will be Thus the classification of the major Gaelic di- greatly facilitated by a quick and accurate grouping alects gives the same general results by both dis- of the dialects. tance metrics, if one discounts Manx and Rathlin Is- land Gaelic, which are flagged as indifferent in both schemes. It is encouraging that the resultant dialect References areas are continuous, align with traditional provin- cial boundaries, and agree with commonly accepted [Babitch1988] Babitch, Rose Mary (1988). “The taxonomies. However, dialect groupings at narrower areal structure of three syntagmatic variables in the terminology of Acadian fishermen.” In Papers levels, such as the exact subgrouping of the major from the Tenth Annual Meeting of the Atlantic provincial dialects, are at this point unstable. This is Provinces Linguistic Association, pages 121–137, no doubt to be explained by the fairly small number University College of North Wales, August 1987. of mapped concepts on which the distance metrics Multilingual Matters, Clevedon, Avon. are based (51).5 As language differences get smaller, one expects that more data will be required in order [Babitch and Lebrun1989] Babitch, Rose Mary, and to elucidate them. Lebrun, Eric (1989). “Dialectometry as com- puterized agglomerative hierarchical classification 4 Conclusions analysis.” Journal of English Linguistics, 22:83– 90. This experiment shows that an automatic procedure can reliably group a language into its dialect areas, [Dietz1983] Dietz, E. Jacquelin (1983). “Permuta- starting from nothing more than phonetic transcrip- tion tests for association between two distance ma- tions as commonly found in linguistic surveys. Ac- trices.” Systematic Zoology, 32:21–26. curate distance matrices, corresponding highly to those obtained by the tedious uncovering of thou- [Durand1889] Durand, J.-P. (1889). “Notes de philologie rouergate, 18”. Revue des langues ro- sands of isoglosses, can be obtained by averaging manes, 33:47–84. the Levenshtein distance between phonetic strings, weighting equally all insertion, deletion, and substi- [Ebobisse1989] Ebobisse, Carl (1989). “Dialectom´e- tution operations on the constituent phones. This trie lexicale des parlers sawabantu.” Journal of turns out to be quite a bit more precise than the West African Languages, 19,2:57–66. common technique of measuring distances by judg- ing etymon identity, and requires even less edito- [Elsie1986] Elsie, Robert (1986). Dialect relation- 4 ships in Goidelic: A study in Celtic dialectology. The one site in Co. Cavan is intermediate between Helmut Buske, Hamburg. the Ulster and the Southern group. Wagner also gives two sites in Leinster. The more southern site, in [Hindley1990] Hindley, Reg (1990). The death of the Kilkenny, groups with the Southern group, and the more northern site, in Co. Louth, groups with Ulster, and . Routledge, London. indeed the county used to be considered part of that province. [Kaufman and Rousseeuw1990] Kaufman, Leonard, 5S´eguy (1973) cites empirical research suggesting and Rousseeuw, Peter J. (1990). Finding groups that general dialectometry requires about a hundred in data: An introduction to cluster analysis. John concepts. Wiley & Sons, New York. [O´ Cu´ıv1951] O´ Cu´ıv, Brian (1951). Irish dialects and Irish speaking districts. Dublin Institute for Advanced Studies. [O’Rahilly1932] O’Rahilly, Thomas Francis (1972). Irish dialects past and present. Dublin Institute for Advanced Studies. [O´ Siadhail1989] O´ Siadhail, M´ıche´al (1989). Modern Irish: Grammatical structure and dialectal vari- ation. Cambridge University Press, Cambridge, Mass. [Philps1987] Philps, Dennis (1987). “La rela- tion entre distance linguistique et distance spa- tiale dans l’Atlas linguistique de la Gascogne.” In Actes du Premier Congr`es International de l’Association Internationale d’Etudes´ Occitanes, 551–569. Westfield College, University of London. [Sankoff and Kruskal1983] Sankoff, David, and Kruskal, Joseph B., eds. (1983). Time warps, string edits, and macromolecules: The theory and practice of sequence comparison. Addison-Wesley, Reading, Mass. [S´eguy1971] S´eguy, Jean (1971). “La relation entre la distance spatiale et la distance lexicale.” Revue de linguistique romane, 35:335–357. [S´eguy1973b] S´eguy, Jean (1973). “La dialectom´etrie dans l’Atlas linguistique de la Gascogne.” Revue de linguistique romane, 37:1–24. [Wagner1958] Wagner, Heinrich (1958). Linguistic atlas and survey of Irish dialects. Dublin Insti- tute for Advanced Studies, 1958–1969. 4 vol.