<<

INTERSPEECH 2004 -- ICSLP 8th International Conference on Spoken ISCA Archive Language Processing http://www.isca-speech.org/archive ICC Jeju, Jeju Island, Korea October 4-8, 2004 How to Integrate Phonetic and Linguistic Knowledge in a Text-to-Phoneme Conversion Task: A Syllabic TPC Tool for French

Nicole Beringer

IDSIA: Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno-Lugano, Switzerland [email protected]

Abstract considered that the morphological decomposition is dif- ficult and has a high error rate. It was recently shown [1] The goal of this work is to present a text-to-phoneme con- that the performance of a -based TPC is compa- version (TPC) check tool for generating dictionaries us- rable to the morpheme based one but that it is not as error able for Speech Synthesis and Speech Recognition which 1 prone in its implementation. includes phonetic and linguistic knowledge . This consideration was taken into account for this The aim is to improve and accelerate the task of pro- work to develop a TPC tool and a TPC checker for ducing canonic pronunciation dictionaries in languages French. The work resulted in two main parts: where no simple text-to-phoneme conversion is possible. The prototype was developed for a French syllable- a TPC tool for French, which includes syllabic based text-to-phoneme task but it is portable to other lan- • knowledge of French (SAM-PA Version [4]) guages. a TPC check tool, which checks whether a • 1. Introduction phoneme string is a possible syllable for French or not. In Speech Synthesis as well as in Automatic Speech Recognition (ASR) the phonetic transcript of the ortho- The former was developed analogously to the P-TRA graphic word form is needed. Therefore, dictionaries are German system [3, 1](see section 2); the latter includes a generated which map the orthographic form to the canon- new method for checking whether a syllable is possible ical phonetic form of a word. Despite sounding easy this in a language or not. Of course, this is the main problem

10.21437/Interspeech.2004-480 is in fact quite difficult because the corresponding pho- for the TPC tool. netic transcripts have to be a hundred percent correct for In section 3 the problems for TPC in French are further use. shown. An analysis of French syllable structure is given In order to guarantee that there will not be errors in in section 4 and we describe our prototype for the French speech synthesis or ASR caused by an incorrect TPC all TPC checker in section 5. Finally, results and future work entries of the dictionaries have to be corrected manually are discussed in the last section. -avery time-consuming job. Of course, the better the TPC transcript the less cor- 2. Syllable-based TPC rection is needed, which, as a consequence minimizes the realtime factor of that work. 2.1. Basics One way to improve the TPC transcript and to mini- As a starting point, translating the orthographic form mize the realtime factor of the manual correction is to use of an utterance to its phonetically transcribed equivalent phonetic and linguistic knowledge within the TPC. may be done by a simple text-to-phoneme conversion, Former studies [1] on German show that there are nu- where each letter is translated into an IPA [6] or SAM- merous problems in mapping German orthography to a PA [5] symbol to obtain the canonic form of the word. high quality transcript without using any linguistic back- Each orthographic form has one standard phonetic ground information. reference which is called the standard phonetic or canonic Including knowledge of the morphological structure form of the word. of words [2] achieves a high accuracy but it has to be This conversion presents little or no difficulty in lan- 1In the here presented paper we mostly refer to speech synthesis. guages where the letters of the orthographic form map But using the TPC checker also in an ASR environment is possible one-to-one to the sounds of the canonic form. Languages which mostly map the letters one-to-one to the sounds are Contrary to our TPC for French presented here the for example Spanish, Italian and Portugese. German P-TRA needs the orthographic syllable struc- Other languages which adhere to complex orthogra- tures in the input to achieve good conversion results. phy like German, English or French exhibit a non-trivial conversion. Within this paper complex orthography is de- 3. TPC for French fined as an orthographic form which is based on etymo- logic word forms and therefore may differ to a greater or The TPC for French presented here was developed to lesser extent in the phonetic realization. automatically transcribe French orthographic words into their canonical form for synthesizing and recognition pur- 2.2. Including poses. In contrast to the German P-TRA, which derives syl- To deal with complex orthographic word forms we have labic information from the orthography we have to in- to know something about the phonotactics of the lan- clude it - as well as the intonation information - in the guage to convert. This is often done by including the canonic form. morphological structure of words but although it is highly acccurate for TPC the required morphological decompo- 3.1. The TPC-Tool sition is tedious and error-prone. The text-to-phoneme conversion is done in three main [1] describes an alternative approach where using the steps: orthographic syllable instead of morphemes is proposed to deal with the multiple choice of spellings for the same letter to phonemic conversion of the relevant phoneme in German. • sound. Here, syllabic information is included to As will be described below, the main problem is avoid ambiguity. the ambiguity of orthographic representation of the same phoneme in all languages which do not map letter to syllable structure of the canonic form: sound one-to-one. • within the canonic form are marked by “ ”. | 2.3. TPC Problems in German - Some Examples main accent is set to the relevant syllable of the • canonic form. In German we have the problem of lengthening vs. spoken /@/ or /h/ depending on the syllabic surrounding, The following paragraphs give a detailed description a problem which is easy to solve for human beings but on how the three steps are realized: impossible for a naive TPC system without any further TPC: Starting with a syllable-based conversion the description. system modifies orthographic representation in order to This problem is multiplied by the possibility of com- get the phonetic transcript. This is done as follows: pound words. In German it is possible to combine almost every word with another form to a new one. or The longest mapping orthographic searchstring is • vowel clusters typical for word boundaries are no longer changed into the phonetic equivalent using conver- spoken as in the isolated word. To solve this problem one sion rules. The searchstring can be the whole word has to consider the syllabic structure. or a syllabic structure. Another typical problem for German is the use of While converting this searchstring into its phonetic glottal stops before at the beginning of a word • or after another vowel; this can be solved by taking the representation, syllable markers are set according syllabic structure of a word into account. to the syllabic structure. Note, that whole words However, having modelled these problems by sylla- are devided into syllables and therefore suffice the ble structure, there still exist exeptions, e.. anglicisms syllabicility. in German. A TPC system has to model the “denglish” For those orthographic parts where no syllabic pronunciation instead of using the right native form in or- • structure is given yet, the system maps letter to der to obtain constistent results. This problem arises in sound one-to-one depending on French phonotac- all languages regardless of the transcription method. tics. For German we could use the P-TRA (Phonetische TRAnskription) TPC system [3] to translate text-to- Note as well, that we do not consider the environment phoneme. Syllable information is directly included in the syllables are found in like in P-TRA. rewrite rules of the form: Once the canonic form has been produced, “liai- son” information - information on how word boundaries left context orthographic searchstring should concatenate - is given according to [7]. We dis- right context =| > Phonemic correlate [1]. | tinguish between an obligatory “liaison” and an optional one. The latter may be included, the former must be in- progressive: “e instable”, after or in a “mot cluded. Besides the “liaison” there are word boundaries • phon´etique” which are never concatenated. This can be found in the rules as well. It is optional in any other case. Syllabic Structure: If the results of a conversion is known to be a syllable, the boundaries to the adjacent 5. TPC Check Tool for French elements are marked with “ ”. | In section 3.2 we showed some of the problems we have Accentuation: In French, normally the last syllable in the text-to-phoneme conversion which makes the TPC of a word is accentuated if its nucleus is not /@/. This result somewhat error prone. Including known French information is included in the canonic form. syllables in the TPC tool improves the result but we still have unspecified sequences which could consist of im- 3.2. TPC Problems in French possible phoneme concatenations or which are not seg- It is clear from the TPC description for French that there mented on syllabic basis. are problems - mainly of ambiguity - in the conversion Therefore, we analyzed the phonetic and syllable processing. structure of French and defined “possible” phonetic syl- Similarly to German there are ambiguities in vowel lables. realization: There are many-to-one conversions in each case 5.1. “Possible” syllables where the letter has an accent. Furthermore, the same Definition: phoneme can have different orthographic strings. ‘‘Possible” syllables are phonetic strings between “ ” vary in their representation depending on which have not yet been covered by an existing trans-| their environment they are found in. Before front vowels lation rule (orthography translated into phoneme string) velars are represented as , before back vowels and have a syllable structure following French phonotac- they are mapped one-to-one. tics. Table 1 shows the structure “possible” syllables have 4. French Syllable Structure to follow.

French syllables [7] usually have an onset of mostly two, onset nucleus coda maximally four consonantal phonemes. In the case of vowel three or four consonantal phonemes we have the sequence {} (oral or nasal) {} /st/ followed by // followed by the semivowel /H/. Oth- semivowel semivowel erwise, the second preferably is a liquid or semivowel. liquid semivowel liquid The nucleus can be any kind of vowel. Note that the liquid centralized Schwa-vowel never bears the word accent! plosive The coda consists of at most three consonantal nasal plosive liquid phonemes. If the syllable has three coda consonants they plosive liquid fricative plosive are concatenated as follows: fricative liquid fricative nasal liquid plosive fricative plosive fricative plosive. • liquid semivowel nasal plosive semivowel nasal fricative /R/ plosive /R/ • fricative semivowel fricative nasal nasal semivowel liquid nasal In principal, if consonantal clusters in the phoneme /st/ liquid liquid plosive liquid sequence are interrupted by /@/, this vowel is skipped if /st/ liquid semivowel plosive fricative plosive the new sequence does not have more than two conso- nants. This is known by the term “mot phon´etique”. Table 1: classification of syllable structure “Liaison” is blocked in the following cases:

the next word begins with an “h aspir´e” Note that we obtain possible syllables by concatenat- • ing the entries of the columns. Per column only one line after real pauses - after a “mot phon´etique” • is possible. It is obligatory in case of assimilations: 5.2. TPC Checker - Functionality regressive: voiceless becomes voiced and vice The main function of the TPC checker is to find out • versa whether a phoneme sequence is a syllable or not. Based on table 1 all phoneme strings between “ ” are The hit rate of the unknown syllables prompted for checked. If the sequence differs from all given possible| manual correction (TPC checker manual prompt) covers syllables, the TPC string is first looked up in a list of “im- another 26.18% and reaches a conversion correctness of possible” syllables. 96.43%. If a “syllable” is found in the list of impossible syl- lables, an automatic phoneme string correction according 6. Conclusion to former encountered manual corrections is performed to obtain better results in the checking part (self-learning It was shown that if phonetic and linguistic knowledge system). is included in TPC the performance of the phonetic tran- Otherwise, the “syllable” and its word environment is script improves regardless of the language. echoed to be checked by a human. We have further shown that syllable-based TPC for In case that the “syllable” in question is considered French reaches a moderate performance. This can be by the expert to be indeed correct, the phoneme string is drastically improved by using syllabic schemes in a included in the conversion rules of the TPC tool and the checking sequence. TPC checker, otherwise it is included in a list of impossi- Although we still need a human correction to ob- ble syllables together with the manual correction. tain phonemic representations for speech synthesis and Based on the word accent rule speech recognition dictionaries, the realtime factor in the correction step minimizes by every new learned syllable. Word accent is set on the last syllable if its Finally, we will include an environment check for the nucleus is not /@/ defined syllable structure following the P-TRA TPC to better manifest the right syllables. the position of the word accent is checked and cor- rected if wrong. 7. References

5.3. TPC Checker - Results [1] Libossek, M.; Schiel, F.: “Syllable-based Text-to- Phoneme Conversion for German“ in: Proc. of the Table 2 shows the TPC error rate of the simple one-to- International Conference on Spoken Language Pro- one conversion, the syllabic TPC for French and the TPC cessing, Beijing, 2000, pp. 283-286. checked strings compared to the manually corrected form as well as the word accent error rate. The number of [2] Wothke, K.: “Automatic tak- known syllables is 5021; the TPC checker includes 225 ing into account the morphological structure of aditionally possible syllables. The number of words to be words” in: Technical Report, IBM Germany, Sci- converted is 16249. entific Center, Heidelberg, 1991.

correct words word accent correct % [3] Stock, D.: “P-TRA - Eine Programmiersprache error % zur phonetischen Transkription,” in: Hess, W.; one-to-one 2637 9.10 16.23 Sendlmeier, W. (eds): Beitr¨age zur Angewandten und Experimentellen Phonetik, Franz Steiner, syllable-based Stuttgart, 1992. TPC 10760 5.23 66.13 TPC checker [4] Heuven, V. J. Van; Pols, L.C.W. (eds): “Analysis auto 11414 4.73 70.25 and Synthesis of speech” Walter de Gruyter, Berlin, TPC checker 1993. manual prompt 15669 3.57 96.43 [5] www.phon.ucl.ac.uk/home/sampa/home.htm. Table 2: Comparison of the TPC error rate of the three [6] Handbook of the International Phonetic Association conversion methods for French [7] Martinet, Walter, Carton, Lucci: “Etude phon´etique du franc¸ais contemporaina ` travers la variation situ- Although the one-to-one conversion performance is ationelle” Grenoble 1983. out of question its word accent error is not too bad. This proofs that the accented ultima in French is very regular. With the syllable-based TPC we gain almost 50% in word correctness. Accents are 4% better as well. With the checker the word correctness improves about 4% only having included automatic corrections out of the list of impossible syllables (TPC checker auto), the word accent error gains 0.5%.