Multilingual Text Analysis for Text-To-Speech Synthesis
Total Page:16
File Type:pdf, Size:1020Kb
Multilingual Text Analysis for TexttoSp eech Synthesis Richard Sproat Sp eech Synthesis Research Department Bell Lab oratories Lucent Technologies Mountain Avenue Ro om d Murray Hill NJ USA rwsbelllabscom In many TTS systems the rst three tasks word seg Abstract We present a mo del of text analysis for textto mentation and digit and abbreviation expansion would sp eech TTS synthesis based on weighted nitestate trans b e classed under the rubric of text normalization and would ducers which serves as the textanalysis mo dule of the mul generally b e handled prior to and often in a quite diererent tilingual Bell Labs TTS system The transducers are con fashion from the last two problems which fall more squarely structed using a lexical to olkit that allows declarative de within the domain of linguisti c analysis scriptions of lexicons morphologica l rules numeralexpansion One problem with this approach is that in many cases the rules and phonologi cal rules inter alia To date the mo del selection of the correct linguisti c form for a normalized item has b een applied to eight languages Spanish Italian Roma cannot b e chosen b efore one has done a certain amount of lin nian French German Russian Mandarin and Japanese guistic analysis Consider an example that is problematic for the Bell Lab oratories American English TTS system a system Intro duction that treats text normalization prior to and separately from The rst task faced by any texttosp eech TTS system is the rest of linguisti c analysis If one encounters the string the conversion of input text into an internal linguisti c repre $ in an English text the normal expansion would b e ve sentation This is in general a complex task since the written dol lars But this expansion is not always correct when func form of any language is at b est an imp erfect representation tioning as a prenominal mo dier as in the phrase $ bil l the of the corresp onding sp oken forms Among the problems that correct expansion is ve dol lar since in general plural noun one faces in handling ordinary text are the following forms cannot function as mo diers in English The analysis of complex noun phrases in the American English system cf While a large numb er of languages delimit words using comes later than the prepro cessing phase and since a whitespace or some other device some languages such as hard decision has b een made in the earlier phase the system Chinese and Japanese do not One is therefore required pro duces an incorrect result to reconstruct word b oundaries in TTS systems for these An even more comp elling example can b e found in Russian languages While in English the p ercentage symb ol when denoting a Digit sequences need to b e expanded into words and more p ercentage is always read as percent in Russian selecting the generally into wellformed numb er names so in En correct form dep ends on complex contextual factors The rst glish would generally b e expanded as two hundred and forty decision that needs to b e made is whether or not the numb er three p ercent string is mo difying a following noun Russian in gen Abbreviations need to b e expanded into full words In gen eral disallows nounnoun mo dication in constructions equiv eral this can involve some amount of contextual disam alent to nounnoun comp ounds in English the rst noun must biguation so kg can b e either kilogram or kilograms de b e converted into an adjective thus rog rye but rzhanoj xleb p ending up on the context ryeadj bread rye bread This general constraint applies Ordinary words and names need to pronounced In many equally to procent p ercent so that the correct rendition of languages this requires morphological analysis even in lan skidka twenty p ercent discount is dvadcatiprocentnaja guages with fairly regular sp elling morphological struc skidka twenty p ercentadj discount gen nomsgfem nomsgfem ture is often crucial in determining the pronunciation of a Note that not only do es procent have to b e in the adjectival word form but as with any Russian adjective it must also agree Proso dic phrasing is only sp oradicall y indicated by punc in numb er case and gender with the following noun Observe tuation marks in the input and phrasal accentuation is almost never indicated At a minimum some amount of But see which treats numeral expansion as an instance of lexical analyis in order to determine eg grammatical part morphological analysis and also the work of van Leeuwen of sp eech is necessary in order to predict which words to which uses cascaded rewrite rules within his To o iP system to L make prominent and where to place proso dic b oundaries wards the same ends c R Sproat Pro ceedings of the ECAI Workshop Extended Finite State Models of Language Edited by A Kornai also that the word for twenty must o ccur in the genitive Overall Architecture case In general numb ers which mo dify adjectives in Russian Let us start with the example of the lexical analysis and pro must o ccur in the genitive case consider for example etazh nunciation of ordinary words taking again an example from storey and dvuxetazhnyj two storeyadj gen nomsgmasc Russian Russian orthography is often describ ed as morpho If the p ercentage expression is not mo difying a following noun logical meaning that the orthography represents not a then the nominal form procent is used However this form ap surface phonemic level of representation but a more abstract p ears in dierent cases dep ending up on the numb er it o ccurs level This is description is correct but from the p oint of view with With numb ers ending in one including comp ound num of predicting word pronunciation it is noteworthy that Rus b ers like twenty one procent o ccurs in the nominative sin sian with a welldened set of lexical exceptions is almost gular After socalled paucal numb ers two three four and completely phonemic in that one can predict the pronuncia their comp ounds the genitive singular procenta is used Af tion of most words in Russian based on the sp elling of those ter all other numb ers one nds the genitive plural procentov words provided one knows the placement of word stress So we have odin procent one p ercent dva procenta nomsg since several Russian vowels undergo reduction to varying de two p ercent and pyat procentov ve p ercent gensg genpl grees dep ending up on their p osition relative to stressed syl All of this however presumes that the p ercentage expression lables The catch is that word stress usually dep ends up on as a whole is in a nonoblique case If the expression is in knowing lexical prop erties of the word includin g morphologi an oblique case then b oth the numb er and procent show up cal class information To take a concrete example consider the in that case with procent b eing in the singular if the num word kostra Cyrillic ko stra b onregenitivesi ngul ar This b er ends in one and the plural otherwise s odnym procen word b elongs to a class of masculine nouns where the word tom with one p ercent with one p ercent instrsgmasc instrsg stress is placed on the inectional ending where there is one s pjatju procentami with ve p ercent with instrpl instrpl 0 Thus the stress pattern is kostr a and the pronunciation is ve p ercent As with the adjectival forms there is nothing 0 kstr with the rst o reduced to Let us assume that p eculiar ab out the b ehavior of the noun procent all nouns ex the morphological representation for this word is something hibit similar b ehavior in combination with numb ers cf 0 like kostrfnoungfmascgfinang afsggfgeng where for con The complexity of course arises b ecause the written form venience we represent phonological and morphosyntactic in gives no indication of what linguisti c form it corresp onds to formation as part of the same string Assuming a nite Furthermore there is no way to correctly expand this form state mo del of lexical structure we can easily imag without doing a substantial amount of analysis of the context ine a set of transducers M that map from that level into a includi ng some analysis of the morphological prop erties of the level that gives the minimal morphologicall ymoti vated an surrounding words as well as an analysis of the relationship notation MMA necessary to pronounce the word In this of the p ercentage expression to those words 0 case something like kostr a would b e appropriate Call this The obvious solution to this problem is to delay the deci lexicaltoMMA transducer L such a transducer can b e w or d sion on how exactly to transduce symb ols like in English constructed by comp osing the lexical acceptor D with M so or in Russian until one has enough information to make that L D M A transducer that maps from the MMA w or d the decision in an informed manner This suggests a mo del to the standard sp elling kostra ko stra would among other where say an expression like in Russian is transduced things simply delete the stress marks call this transducer S into all p ossible renditions and the correct form selected from The comp osition L S then computes the mapping from w or d the lattice of p ossibili ties by ltering out the illegal forms the lexical level to the surface orthographic level and its in An obvious computational mechanism for accomplishin g this verse L S S L computes the mapping from w or d w or d is the nitestate transducer FST Indeed since it is well the surface to all p ossible lexical representations for the text known that FSTs can also b e used to mo del most morphol word A set of pronunciatio n rules compiled into