
Name pronunciation in German text-to-speech synthesis Stefanie Jannedy Bernd MSbius Linguistics Dept. Language Modeling Research Ohio State University Bell Laboratories Columbus, OH 43210, USA Murray Hill, NJ 07974, USA j annedy©ling, ohio-state, edu bmo©research, bell-labs, com Abstract In this paper, we concentrate on street names be- cause they encompass interesting aspects of geo- We describe the name analysis and pro- graphical as well as of personal names. Linguistic de- nunciation component in the German ver- scriptions and criteria as well as statistical considera- sion of the Bell Labs multilingual text-to- tions, in the sense of frequency distributions derived speech system. We concentrate on street from a large database, were used in the construction names because they encompass interest- of the name analysis component. The system was ing aspects of geographical and personal implemented in the framework of finite-state trans- names. The system was implemented in ducer (FST) technology (see (Sproat, 1992) for a the framework of finite-state transducer discussion focussing on morphology). For evalua- technology, using linguistic criteria as well tion purposes, we compared the performances of the as frequency distributions derived from a generM-purpose text analysis and the name-specific database. In evaluation experiments, we systems on training and test materials. compared the performances of the general- As of now, we have neither attempted to deter- purpose text analysis and the name-specific mine the etymological or ethnic origin of names, nor system on training and test materials. The have we addressed the problem of detecting names name-specific system significantly outper- in arbitrary text. However, due to the integration of forms the generic system. The error rates the name component into the general text analysis compare favorably with results reported in system of GerTTS, the latter problem has a reason- the research literature. Finally, we discuss able solution. areas for future work. 2 Some problems in name analysis 1 Introduction What makes name pronunciation difficult, or spe- The correct pronunciation of names is one of the cial, in comparison to words that are considered as biggest challenges for text-to-speech (TTS) conver- regular entries in the lexicon of a given language? sion systems. At the same time, many current or en- Various reasons are given in the research literature visioned applications, such as reverse directory sys- (Carlson, GranstrSm, and LindstrSm, 1989; Macchi tems, automated operator services, catalog ordering and Spiegel, 1990; Vitale, 1991; van Coile, Leys, and or navigation systems, to name just a few, crucially Mortier, 1992; Coker, Church, and Liberman, 1990; depend upon an accurate and intelligible pronuncia- Belhoula, 1993): tion of names. Besides these specific applications, • Names can be of very diverse etymological ori- any kind of well-formed text input to a general- gin and can surface in another language without purpose TTS system is extremely likely to contain undergoing the slow linguistic process of assim- names, and the system has to be well equipped to ilation to the phonological system of the new process these names. This requirement was the main language. motivation to develop a name analysis and pronunci- ation component for the German version of the Bell • The number of distinct names tends to be very Labs multilingual text-to-speech system (GerTTS) large: For English, a typical unabridged colle- (M6bius et al., 1996). giate dictionary lists about 250,000 word types, Names are conventionally categorized into per- whereas a list of surnames compiled from an sonal names (first and surnames), geographical address database contains 1.5 million types (72 names (place, city and street names), and brand million tokens) (Coker, Church, and Liberman, names (organization, company and product names). 1990). It is reasonable to assume similar ratios 49 for German, although no precise numbers are Generally speaking, nothing ensures correct pro- currently available. nunciation better than a direct hit in a pronuncia- • There is no exhaustive list of names; and tion dictionary. However, for the reasons detailed in German and some related Germanic lan- above this approach is not feasible for names. In guages, street names in particular are usu- short, we are not dealing with a memory or storage ally constructed like compounds (Rheins~ra~e, problem but with the requirement to be able to ap- Kennedyallee) which makes decomposition both proximately correctly analyze unseen orthographic practical and necessary. strings. We therefore decided to use a weighted finite-state transducer machinery, which is the tech- • Name pronunciation is known to be idiosyn- nological framework for the text analysis compo- cratic; there are many pronunciations contra- nents of the Bell Labs multilingual TTS system. dicting common phonological patterns, as FST technology enables the dynamic combination well as alternative pronunciations for certain and recombination of lexical and morphological sub- grapheme strings. strings, which cannot be achieved by a static pronun- • In many languages, general-purpose grapheme- ciation dictionary. We will now describe the proce- to-phoneme rules are to a significant extent dure of collecting lexically or morphologically mean- inappropriate for names (Macchi and Spiegel, ingful graphemic substrings that are used produc- 1990; Vitale, 1991). tively in name formation. • Names are not equally amenable to morpho- logical processes, such as word formation and 3 Productive name components derivation or to morphological decomposition, 3.1 Database as regular words are. That does not render such an approach unfeasible, though, as we show in Our training material is based on publically avail- this paper. able data extracted from a phone and address di- rectory of Germany. The database is provided on • The large number of different names together CD-ROM (D-Info, 1995). It lists all customers of with a restricted morphological structure leads Deutsche Telekom by name, street address, city, to a coverage problem: It is known that a rel- phone number, and postal code. The CD-l~OM atively small number of high-frequency words contains data retrieval and export software. The can cover a high percentage of word tokens in database is somewhat inconsistent in that informa- arbitrary text; the ratio is far less favorable tion for some fields is occasionally missing, more for names (Carlson, GranstrSm, and LindstrSm, than one person is listed in the name field, busi- 1989; van Coile, Leys, and Mortier, 1992). ness information is added to the name field, first We will now illustrate some of the idiosyncra- names and street names are abbreviated. Yet, due to cies and peculiarities of names that the analysis has its listing of more than 30 million customer records to cope with. Let us first consider morphological it provides an exhaustive coverage of name-related issues. Some German street names can be mor- phenomena in German. phologically and lexically analyzed, such as Kur- fiivst-en-damm ('electorial prince dam'), Kirche-n- 3.2 City names weg ('church path'). Many, however, are not de- The data retrieval software did not provide a way composable, such as Henmerich ('?') or Rimpar- to export a complete list of cities, towns, and vil- stra~e ('?Rimpar street'), at least not beyond ob- lages; thus we searched for all records listing city vious and unproblematic components (Stra~e, Weg, halls, township and municipality administrations Platz, etc.). and the like, and then exported the pertinent city Even more serious problems arise on the phono- names. This method yielded 3,837 city names, ap- logical level. As indicated above, general-purpose proximately 15% of all the cities (including urban pronunciation rules often do not apply to names. districts) covered in the database. It is reasonable For instance, the grapheme <e> in an open stressed to assume, however, that this corpus provided suffi- syllable is usually pronouned [e:]; however, in many cient coverage of lexical and morphological subcom- first names (Stefan, Melanie) it is pronounced [e]. ponents of city names. Or consider the word-final grapheme string <ie> We extracted graphemic substrings of different in Batterie [bat~r'i:] 'battery', Materie [mat'e:ri~] lengths from all city names. The length of the strings 'matter', and the name Rosemarie [r'o:zomari:]. And varied from 3 to 7 graphemes. Useful substrings word-final <us>: Mus [m'u:s] 'mush, jam' vs. Eras- were selected using frequency analysis (automati- mus [er'asmus]. A more special and yet typical ex- cally) and native speaker intuition (manually). The ample: In regular German words the morpheme- final list of morphologically meaningful substrings initial substring <chem> as in chemisch is pro- consisted of 295 entries. In a recall test, these 295 nounced [§e:m], whereas in the name of the city strings accounted for 2,969 of the original list of city Chemnilz it is pronounced [kcm]. names, yielding a coverage of 2,969/3,837 = 77.4%. 50 Mfinchen I Berlin Hamburg KJln Total (south) (east) (north) (west) component types 7,127 7,291 8,027 4,396 26,841 morphemes 922 574 320 124 1,940 recall 2,387 2,538 4,214 2,102 11,241 residuals (abs.) 4,740 4,753 3,813 2,294 15,600 residuals (rel.) 66.5% 65.0% 47.5% 52.2% 58.1% Table 1: Extraction of productive street name components: quantitative data. 3.3 First names city, and so on. The training corpus for first names and street names Table 1 gives the numbers corresponding to the was assembled based on data from the four largest steps of the procedure just described. The number cities in Germany: Berlin, Hamburg, KJln (Cologne) of morphemes collected from the four cities is 1,940. and Miinchen (Munich). These four cities also pro- The selection criterion was frequency: Component vide an approximately representative geographical types occurring repeatedly within a city database and regional/dialectal coverage.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-