<<

Proc of 4th & Technology Conference, November 6-8, 2009, Poznan, Pronunciation and Writing Variants in : Case of Mobile -Deletion in Large Corpora

Natalie . Snoeren, Martine Adda-Decker LIMSI/CNRS (UPR351), BP 133 -91403 Orsay cedex {madda,nemoto}@limsi.fr

Résumé The of the Grand-Duchy of , Luxembourgish, has often been characterized as of ' underdescribed and under-resourced . Because of a limited written production of Luxembourgish, poorly observed writing standardization (as compared to other languages such as English and French) and a large diversity of spoken varieties, the study of Luxembourgish poses many interesting challenges to automatic speech processing studies as well as to linguistic enquiries. In the present paper, we make use of large corpora to on typical pronunciation and writing variants in Luxembourgish, elicited by mobile -n deletion (hereafter shortened to MND). Using over 10 millions of from transcribed debates and 10k words from reports we examine the reality of MND variants in written transcripts of speech. The goal of this study is 3-fold : quantify the potential of variation due to MND in written Luxembourgish, check the mandatory status of the MND rule and discuss the arising problems for automatic spoken Luxembourgish processing. Pronunciation and Writing Variants in Luxembourgish: The Case of Mobile N-Deletion in Large Corpora

Natalie D. Snoeren & Martine Adda-Decker

LIMSI-CNRS, BP 133 91403 Orsay cedex, FRANCE {natalie.snoeren, madda}@limsi.fr

Abstract The national language of the Grand-Duchy of Luxembourg, Luxembourgish, has often been characterized as one of Europe’s under- described and under-resourced languages. Because of a limited written production of Luxembourgish, poorly observed writing standard- ization (as compared to other languages such as English and French) and a large diversity of spoken varieties, the study of Luxembourgish poses many interesting challenges to automatic speech processing studies as well as to linguistic enquiries. In the present paper, we make use of large corpora to focus on typical pronunciation and writing variants in Luxembourgish, elicited by mobile -n deletion (hereafter shortened to MND). Using over 10 millions of words from transcribed Parliament debates and 10k words from news reports we examine the reality of MND variants in written transcripts of speech. The goal of this study is 3-fold: quantify the potential of variation due to MND in written Luxembourgish, check the mandatory status of the MND rule and discuss the arising problems for automatic spoken Luxembourgish processing.

1. Introduction ally emerged from the work of a number of specialists charged with the task of creating a was published be- Luxembourg is a small, landlocked in , tween 1950 and 1977 (Linden, 1950). This dictionary under- bordered by , France and . The official lan- went some modifications and has officially been adopted in the guage Luxembourgish ("Lëtzebuergesch") is the language spo- reform of 1999. Nonetheless, up until today, Ger- ken by native . From a linguistic typological man and French are the most practiced languages for written point of view, Luxembourgish belongs to the West central di- administrative purposes and communication in Luxembourg, alects of High German and is therefore part of the Germanic guaranteeing a larger dissemination, whereas Lëtzebuergesch languages. Just the , Luxem- is mainly used for oral communication. The strong in- bourgish can be considered as a with strong fluence of both German and French, among other factors, can Romance and Germanic influences. is estimated that about explain the fact that Luxembourgish exhibits a large amount of 300,000 people worldwide speak Luxembourgish. Although both pronunciation and derived potential writing variants. It is Luxembourgish is the national language of Luxembourg since common to have pronunciations changing from one place to an- 1984, French and German remain the other administrative lan- other within an area of only several kilometers especially for guages. Because of the fact that Luxembourgish is embedded in function words (.. the English personal "our" can this multilingual context, it may entail frequent code-switching. be written and pronounced as eis [ajs], ons [ ], is [i:s] (even Indeed, there are virtually no pure monolinguals in Luxembourg though the standard form is considered to be ei). These pro- and people switch from to another fairly easily. nunciation variants may give rise to resulting variations in - Therefore, the linguistic situation in Luxembourg poses a real ten Luxembourgish, as Luxembourgish strives for challenge for researchers concerned with both automatic and phonetic accuracy (Schanen, 2004). The question then arises, in language processing. As was previously pointed out particular for oral transcripts, whether the written form reflects (Adda-Decker et al., 2008; Krummes, 2006), Luxembourgish the perceived pronunciation form or whether some sort of nor- should be considered as a partially under-resourced language, malization process is at work that eliminates part of the varia- mainly because of the fact that written production remains rel- tion. Thus, Luxembourgish is predominantly a spoken language atively low. Rather surprisingly, written Luxembourgish is not that tends to reproduce the observed variations when written. systematically taught to children in primary school: German is With respect to automatic speech recognition, text normaliza- usually the first learned, followed by French tion is an important issue in order to achieve reliable estimates (Berg and Weis, 2005). As was reported by Adda-Decker et al. for n-gram based language models. In the current paper we will (2008), a relatively important production of Lëtzebuergesch lit- some important issues related to writing and pronunci- erature was being observed throughout the 19th century, and a ation variants, and particular those that are elicited by the rule number of proposals for standardizing the orthography of Lux- of mobile n-deletion. Before turning to this variant, we will first embourgish can be traced back to the middle of the 19th cen- discuss the relevance of pronunciation and writing variants for tury. Since that time, the question of appropriate spelling rules automatic speech processing. arose. There was no officially recognized spelling system un- til the adoption of the "OLO" (ofizjel lezebuurjer ortografi) in 1.1. ASR and the study of pronunciation and writing 1946, which aimed at producing written forms that clearly di- variants verge from German orthography. The success of these rules re- Speech is known to be highly variable, and major factors that mained very limited. A more successful standardization eventu- contribute to the variation concern phonemic context, speaker identity (gender, age, health, emotion), speaking style and rate, et al., 2008). First, the percentage of Luxembourgish words communication contexts as well as environmental and record- shared with French was particularly high. This is hardly sur- ing conditions. Variation must be addressed globally for Au- prising, since French is largely being used in administrative and tomatic Speech Recognition (ASR) systems to produce faith- official speech in Luxembourg. Second, the curves for Luxem- ful orthographic transcriptions. ASR has made a tremendous bourgish showed that the contribution of shared frequent words progress in the study of pronunciation variants over the last is , whereas the part corresponding to proper names re- decade, with a significant decrease in error rates for recog- mains relatively small in these data. Similar comparisons be- nition. Present challenges concern improvement of language tween French and English from broadcast news data showed and pronunciation modeling. The problem of modeling pronun- an important part of shared proper name forms but almost no ciation variants appear to be crucial, especially for spontaneous function word forms. Furthermore, the authors showed that a speech and many efforts have been spent over the last years on significant amount of French and German imports are a major pronunciation variants by the ASR community (Strik and Cuc- characteristic of the Luxembourgish language. These imports chiarini, 1999). Reductions typically produce pronunciation may give rise to variation in written and spoken forms. French variants, producing either different (centralized) (van imports may be pronounced according to their native system and Pols, 2003), fewer phonemes, or even fewer sylla- or adapted to Luxembourgish (e.g., the au sequence giving rise bles (Adda-Decker et al., 2005). In terms of lexical effects, to two pronunciations [aw] or []). Similar cases can be ob- reductions seem to affect the speech that contain the least com- served for German imports that are being adapted to Luxem- municative value such as function words that are highly pre- bourgish (e.g., the German suffix -ung may be pronunced and dictable from the surrounding context, such as (e.g., written either with u or with o (Stëmmung or Stëmmong). Mul-

’est-à-dire, "that is"), morphological items in particular end- tilingual entries (e.g. ville meaning "city" [vil] in French and ¢ ings, dates, markers in spontaneous speech and so "many" [f ¡ ] in Luxembourgish). Given these pronunciation forth. As the acoustic word models are obtained by variants, specific tools are currently being developed in order model concatenation according to the pronunciation dictionary, to determine the origin(s) of a particular word (Luxembourgish, appropriate variant descriptions are required. The commonly French, German, English...), as spelling rules are partially in- adopted acoustic HMM (Hidden Markov Model) structure can herited from the source language. Luxembourgish, and more implicitly account for some amount of speech lengthening, - generally languages in multilingual contexts therefore introduce pecially stemming from hesitation phenomena, and for parallel new and interesting challenges to the development of pronunci- variants (Adda-Decker and Lamel, 1999). However, pronunci- ation . ations with a number of phonemes differing from the one spec- ified in the pronunciation dictionary are generally poorly dealt 1.2. The case of mobile n-deletion (MND) with. Although insertions may occur (e.g., the In the following we focus on Lëtzebuergesch specific variants, in French) the most problematic situation arises with missing which are not due to imports. Among pre- phonemes. Acoustic phone models may implicitly capture lim- viously studied pronunciation variants in Luxembourgish, we ited reductions. For example, schwa models may simply rep- can include mobile n-deletion (hereafter shortened to MND, fol- resent the surrounding consonants, thus modeling schwa dele- lowing Krummes(2006), also known as the Eifeler rule (Gilles, tion. The drawback is then the loss of phonemic genericity; 2005; Schanen and Lulling, 2003). This phonological rule reductions beyond the context-dependent phone model scope states that a word-final -n be retained before a or be- (i.e., generally triphones) necessarily need to be explicitly rep- fore one of the following phonemes: {n, d, , ts [], }. Any resented in the pronunciation dictionary. There are several im- other phonemic right contexts cause the deletion of the final -n. portant issues that concern the elaboration of high-quality pro- In linguistic theory, MND is usually considered to be a cate- nunciation dictionaries. First, it is important to get a synthetic gorical phonological process (see Gilles, 2005, for a non-linear view of major writing and pronunciation variant phenomena phonological account of the rule). It is assumed that MND does as well as the effective application of the orthographic writ- not give rise to any graded phonetic transformation in which ing conventions. Given the specificities of Luxembourg, it ap- fine phonetic traces to the underlying /n/ are still present in the pears important to check the variations arising from the differ- speech signal after application of the rule, unlike for instance ent languages in contact in Luxembourg. We can then focus on French assimilation (Snoeren et al., 2006). Moreover, -n Luxembourgish-specific phonological phenomena. can also be deleted within -word boundaries. That Given the rich multilingual context of Luxembourg, Adda- is, the first element of compound words ending in -n generally Decker et al. (2008) have initiated some preliminary investiga- undergoes MND. So, for instance, given a first element of the tions to measure the number of lexicon entries shared between word Fritten ("French fries"), the -n is preserved before /d/ as major European languages (French, German, English, Spanish) in Frittendëppen ("chip pan"), but generally deleted before /f/ and compared this with Luxembourgish. To this end, the au- as in Frittefett ("frying fat"). Prefixes ending in -n, also un- thors used corpora from the Chamber (House of Parliament) dergo MND. Given the preposition an ("in"), prefixed to the debates. Other resources included data from news channels de- droen (Ger. "tragen", Eng. "to carry") results in androen, livered by the Luxembourgish radio and television broadcast whereas prefixed to a word such as Fet (Fr. "gras", Eng. "fat"), company RTL. As was shown in previous studies, a word sort results in the verb afetten ("to grease"). by frequency typically tends to put function words at the top Although MND is nowadays presented as a systematic spelling of the list, followed by general language items, technical items rule, its correct application still depends on a knowledge of spo- and finally proper names. Word list comparisons between pairs ken Luxembourgish, as was pointed out by Schanen (2004). of languages showed two noteworthy differences (Adda-Decker Gilles (2005), among others, has defined exceptions under which MND does not apply, with corresponding explanations. Parliament) and to a lesser extent news channels that are de- Word endings with a short vowel followed by -n or -nn (e.g., livered by the Luxembourgish radio and television broadcast Mann, "man"; Sënn, "sense", "") always keep their final company. The Parliament debates are broadcast and made avail- -n because these words carry the Middle-Franconian tone ac- able on the official web site (www.chd.lu), together with written cent which triggers a non-contrastive lengthening of the nasal Chamber reports, that correspond to fairly reliable manual tran- [n:] (Gilles, 2005). Apart from these exceptions, other factors scripts of the oral debates. Another interesting sibling resource may play a role as well (specific parts of speech, proper names, stems from the Luxembourgish radio and television broadcast imports etc.). These exceptions have not necessarily been de- company RTL, that produces news written in Luxembourgish scribed yet and therefore a large corpus study such as the one on its web site (www..lu), together with the corresponding we are currently working on would be useful in this regard. We audio data. However, it must be noted that only a very limited are developing tools that specify all word pairs obeying specific amount of written Luxembourgish can be found here, whereas word-boundary patterns or phoneme sequences. The applica- RTL has a profuse audio/video production. Table 1 summarizes tion of MND can then easily be verified for these word pairs. the different text and audio resources that are currently being For instance, given the written sequence -u(nn) -, with the un- collected for analysis. derlying phonemic word boundary sequence /u(n).k/, the final -n phoneme should be deleted in verb sequences such as geflu- written sibling: audio+written nn komm ("came flying"), thereby observing the MND rule. In Source: CHAMBER RTL a -verb sequence such as d’Bunn kënnt ("the train is arriv- lb.wikipedia.org www.chd.lu www.rtl.lu ing"), the final -n should be kept. Proper names such as Hélène Volume: 500k 12M 700k should also keep their final -n. Years 2008 2002-2008 2007-2008 In a constrained word pair context, Krummes (2006) investi- gated MND during speech production in a naming task in which Table 1: Major Luxembourgish text and audio sources for den the determiner ("the") was being inserted in a number of ASR studies. Collected amounts are given in word numbers. phrases. Since Luxembourgish uses with proper (Adapted from Adda-Decker et al. (2008). names, it was expected that the determiner be pronounced de or den, depending on the following context for which male fore- names were used. It was found that speakers deleted the word- 2.2. Potential mobile -n sites final -n before the above-mentioned subset of consonants. This As was mentioned before, MND concerns the deletion of a suggests that speakers’ intuitions are in line with the linguistic word-final -n, giving rise to a variant of the same lexical item. rules pertaining to MND. Nonetheless, participants’ responses Following the official Luxembourgish orthography, Luxem- were found to be less coherent in pronouncing the word-final bourgish words such as wann and wa ("when") are both rec- -n before the expected initial phoneme /z/ (e.g., Si(nn) si de(n) ognized as existing lexical items and, as such, listed in the dic- Samson siche gaangen?) The author suggests that this variabil- tionary. Because of the fact that our corpora contain items that ity has to do with the fact whether final -n occurs in function can occur without word-final -n, with -n, or double -n, we first words ending with a schwa or not. Moreover, MND is overall sought to know how many Luxembourgish word-final -n (or - observed more frequently within function words as opposed to nn) words also occur without a word-final -n (or -nn). These real . However, it is not clear whether the exceptions are items correspond to potential MND sites. To this end, an ex- not really known and whether we have to do with a categorical traction tool was developed and implemented as a PERL script type of linguistic rule. The data of only five participants were that took as input the word list derived from the word tokens of used in the survey, and it remains to be seen whether the ten- the corpora (cf. Table 1), and produced as output a compressed dencies reported in the Krummes (2006) study still exist with a word list merging all the word-final -n variants in the format of much larger number of participants and more complex MND- the annotation that list word-final -n (or -nn) items that also ex- deletable contexts. ist without -n. A few examples are given below: gezwonge#n In the current contribution, we propose to investigate the written gezwonge; gezwongen (Eng. "forced"); ausgi#nn ausgi; ⇒ ⇒ and pronunciation variants in Luxembourgish that are elicited ausginn (Eng. "spent"); si#n#nn si; sin; sinn (Eng. "are"). ⇒ due to MND by looking into large transcribed corpora (Adda- The input word list from the transcriptions includes 194k dis- Decker et al., 2008), i.e. manual transcriptions of recorded tinct word forms. The correct orthography of these words can speech from either the Chamber debates or web news reports. be checked using the official Luxembourgish spelling checker By doing so, we will be in an excellent position to character- developed by the Centre de Recherche Public G. Lippmann (for ize this pronunciation variant and to establish with what kinds more information see http://cortina.lippmann.lu/site/index.php) of variants the Luxembourgish listener is actually confronted with the support of the CPLL (Conseil Permanent pour la with. Langue Luxembourgeoise). This checking allows to list all the words that are considered to be officially admissible Luxem- 2. The current study bourgish word forms. This officially correct list is termed here the Cortina list and includes 121k words. As such, the word list 2.1. Data selection can be thought of as a standardized type of dictionary, contrary Sibling resources that provide both audio and corresponding to the word lists that are derived from the transcriptions. The written materials are of major interest for ASR development. results of our word-final -n variant merging are summarized in The most interesting resource we have come across until so far Table 2. for Luxembourgish, consists of the Chamber debates (House of As can be seen from Table 2, there is a relatively large number -n variants Transcriptions Cortina for the most part followed by prepositions (Bühn fir) - 194k 121k which in this particular example should not be considered as #n 30318 (15.6) 5894 (4.8) an MND violation but as an MND exception. Further examples #nn 583 (0.3) 101 (0) include nouns followed by (Kirchen gét) which in this #n#nn 15 (0) 136 (0) case is a genuine MND violation. Thus, the MND is fairly well respected and these results make even more sense in the light of the relatively large number of listed variants resulting from Table 2: Word type frequencies (%) of potential mobile -n items MND that was mentioned before. In order to verify this hypoth- and variants as found in the lists derived from the transcribed esis, however, the next step would be to collect more linguistic corpora and in the Cortina list (official orthography). The first (i.e. syntactic) information about the type of items that undergo line indicates the full word list sizes. MND and to see whether this information correlates with the potential mobile -n words that are listed in the dictionaries and of word-final -n items (4.8% of the word types) that also occur recognized as lexical items in their own right by Luxembour- without the final -n, according to the Cortina list. This propor- gish listeners. Finally, transcriptions need to be checked against tion more than triples in the Transcriptions list (15.6%), which oral productions to clarify whether MND is similarly respected is not surprising as human transcriptions generally allow for in the oral modality. more variation, including potential errors. Another issue might be that the Cortina spell checker did not include all the possible 2.4. MND and Word list coverage variants due to MND. The large amount of additional word-final Language model development in ASR requires that the word -n variants may arise from genuine variation in the produced lists that are being used achieve high lexical coverage. Fol- speech due to the MND process. In further studies this point lowing Adda-Decker et al. (2008), we wanted to quantify the will be investigated, in particular by confronting sibling written impact of mobile -n variants on lexical coverage in Luxembour- and oral modalities. Although the number of -#n#nn type items gish. To this end, we used the Chamber corpus that consists in the Cortina list is very low (136 items), it is interesting to note of 12M raw words as training data to build different size word that this type of items is virtually not at all occurring in the tran- lists (i.e. system ). A held out development set scriptions. One possible explanation might perhaps be related of 100k raw words was then used to measure the percentage to avoidance of redundancy when transcribing (i.e. two ortho- of words covered by the different size word lists on new data. graphic representations correspond to the same phonetic vari- The complementary measure of unknown words, termed Out ant). These raw measurements provide us with some interesting of words, is displayed in Figure 1 as a function of clues about potential mobile -n sites in Luxembourgish. The word list size (varying between 10k and 150k lexical items). fact that a lot of the resulting MND variants are already listed The corresponding curves inform about the impact of MND, in word-lists might be helpful in explaining under what circum- that is, after filtering out all word-final -n items, on the word stances MND occurs in Luxembourgish speech. Nonetheless, a list’s global lexical coverage capacity. As can be seen from more in-depth analysis is clearly called for to see whether the the Figure, OOV rates overall decrease as the word list size in- number of potential -n sites varies as a function of other linguis- creases. More importantly, the difference between the MND tic factors (e.g., syntactic information). filtering and the standard development data is relatively impor- tant at a low word list size. However, the difference between 2.3. MND in transcriptions the two curves reduces as the word list size increases (beyond The goal of a second investigation was to find out whether the 80k). This effect obviously calls for further investigation, such MND rule is being respected in two transcriptions from the as comparisons with other word-final phonemes, but it illus- Chamber debates and one trasncription from a news channel trates well how automatic tools developed in ASR can highlight (transcribed by professional transcribers, are native speak- linguistic phenomenons such as Luxembourgish MND. ers of Luxembourgish). A PERL script was implemented that allowed to the number of lexical items containing a word- 3. Summary and prospects final n in the phonemic contexts in which MND occurs. Table In the present contribution, we have tried to draw attention to 3 gives a summary of the word frequency and respective type the complex linguistic situation of Luxembourgish, a partially frequencies (%) of violation of the MND rule (taking into ac- under-resourced and under-described language. For ASR de- count the exceptions to the rule such as word-final -ioun where velopment, the use of sibling resources that provide similar con- word-final -n is always being retained). tents in both written and oral/auditory modalities is extremely useful. Although there are relatively few written resources in Transcription: Ch1 Ch2 News Luxembourgish as compared to other European languages, cor- (12395) (1952) (2326) pus studies in Luxembourgish will substantially add to the cur- MND .: 0.39 0.46 2.53 rent debate on the processing of pronunciation variants in au- tomatic and natural speech processing. An important question Table 3: Word token frequencies (%) and MND violation type that is raised by the ASR community, is to know whether the frequencies (%) for three transcriptions.The first line indicates variation is modeled at the lexical level or handled by the acous- the full word list size. tic models. As was pointed out by Strik (2001), simply adding pronunciation variants at a lexical level will not necessarily re- These numbers suggest that there are relatively few cases for sult in better recognition performance because of an increased which the MND rule is being violated. The violations include confusability. In the past, it has has previously been shown 4. References . Adda-Decker and L. Lamel. 1999. Pronunciation variants across systems, languages and speaking style. Speech Com- munication, 29:83–98. M. Adda-Decker, P. Boula de Mareüil, G. Adda, and L. Lamel. 2005. Investigating syllabic structures and their variation in spontaneous french. Speech Communication, 46:119–139. M. Adda-Decker, T. Pellegrini, E. Bilinski, and G. Adda. 2008. Developments of letzebuergesch resources for auto- matic speech processing and linguistic studies. In LREC. C. Berg and C. Weis. 2005. Sociologie de l’enseignement des langues dans un environnement multilingue. rapport national en vue de l’élaboration du profil des politiques linguistiques éducatives luxembourgeoises. Technical report. M.G. Gaskell and .D. Marslen-Wilson. 1996. Phonological variation and lexical access. Journal of Experimental Psy- Figure 1: Out of Vocabulary (OOV) word rates measured as a chology: Human Perception & Performance, 22:144–158. function of word list sizes the Chamber standard development P. Gilles. 2005. Phonologie der n-Tilgung im - data (black curve) and after MND filtering (red curve). fränkischen (’Eifler Regel’): Ein Beitrag sur dialektologis- chen Prosodieforschung. Perspektiven einer linguistischen Luxemburgistik - Studien zu Diachronie und Synchronie. that better recognition performances can be obtained when tak- Universitätsverlag WINTER . ing into account the probabilities of pronunciation variants, ei- C. Krummes. 2006. Sinn si or si si? Mobile-n deletion in lux- ther at the lexical level or in the acoustic models (Strik, 2001). embourgish. In Papers in Linguistics from the University of This information can be readily derived from the type of large : Proceedings of the 15th Postgraduate Confer- corpus-based analyses we are proposing here. Moreover, in or- ence in Linguistics, Manchester. der to assess pronunciation variants, it seems that representative P. Linden. 1950. Luxemburger Wörterbuch. P. Linden, Hof- data are needed. New methods that are based on pronunciation buchdrucker. rules, rather than on the variants directly, can be used to gen- eralize over variants unseen in the training data. From this re- F. Schanen and . Lulling. 2003. Introduction à l’orthographe spect, Luxembourgish MND provides an excellent test-case, as luxembourgeoise. In www.cpll.lu/ortholuxs_l.html, G.-D. de the variants elicited by MND are governed by a linguistic rule. Luxembourg. Computational ASR investigations and corpus-based analyses F. Schanen. 2004. Parlons Luxembourgeois. L’Harmattan. will not only enhance the development of a more full-fledged N.D. Snoeren, J. Segui, and P.A. Hallé. 2006. A voice for the ASR system for Luxembourgish, but can also be used to gen- voiceless: Production and perception of voice assimilation in erate more specific predictions about the role of the actual ex- French. Journal of , 34:241–268. perience that listeners have with pronunciation variants. In turn N.D. Snoeren, M.G. Gaskell, and A.M. Di Betta. 2009. The their predictions can then be tested in other domains such as perception of assimilation in newly learned words. psycholinguistics. Indeed, over the last decade a number of Journal of Experimental Psychology: Learning, Memory & studies has looked into perceptual processing of pronunciation Cognition, 2(4):542–549. variants, most notably assimilation of , in H. Strik and C. Cucchiarini. 1999. Modeling pronunciation spoken words (Gaskell and Marslen-Wilson, 1996; Snoeren et variation for asr: A survey of the literature. Speech Com- al., 2009). Another important issue pertains to the role of or- munication, 29:115–246. thography in the processing of pronunciation variants, an aspect H. Strik. 2001. Pronunication adaptation at the lexical level. was mentioned before in the current contribution. A critical as- In ISCA Tutorial and Research Workshop, Sophia-Antipolis, pect in the debate on lexical representation and their phonolog- France. ical structure is whether distinguishing pronunciation variants .J.H. van Son and L.C.W. Pols. 2003. An acoustic model of (e.g., those elicited by n-deletion) has to do with auditory per- communicative efficiency in consonants and taking ceptual abilities or whether explicit information over the con- into account context distinctiveness. In 15th International trastive sounds may be needed to build separate lexical repre- Conference of Phonetic , Barcelona, . sentations. Given the implications of large corpus-based analy- ses, it is hoped that this line of research on Luxembourgish will sparkle more interest for the language in researchers working in the domains of ASR, cognitive psychology, and linguistics.

Acknowledgements The research has been supported by a grant from the Luxem- bourgish F.N.R (Fonds National de la Recherche) awarded to N.D.S.