How to Integrate Phonetic and Linguistic Knowledge in a Text-To-Phoneme Conversion Task: a Syllabic TPC Tool for French
Total Page:16
File Type:pdf, Size:1020Kb
INTERSPEECH 2004 -- ICSLP 8th International Conference on Spoken ISCA Archive Language Processing http://www.isca-speech.org/archive ICC Jeju, Jeju Island, Korea October 4-8, 2004 How to Integrate Phonetic and Linguistic Knowledge in a Text-to-Phoneme Conversion Task: A Syllabic TPC Tool for French Nicole Beringer IDSIA: Dalle Molle Institute for Artificial Intelligence Galleria 2, 6928 Manno-Lugano, Switzerland [email protected] Abstract considered that the morphological decomposition is dif- ficult and has a high error rate. It was recently shown [1] The goal of this work is to present a text-to-phoneme con- that the performance of a syllable-based TPC is compa- version (TPC) check tool for generating dictionaries us- rable to the morpheme based one but that it is not as error able for Speech Synthesis and Speech Recognition which 1 prone in its implementation. includes phonetic and linguistic knowledge . This consideration was taken into account for this The aim is to improve and accelerate the task of pro- work to develop a TPC tool and a TPC checker for ducing canonic pronunciation dictionaries in languages French. The work resulted in two main parts: where no simple text-to-phoneme conversion is possible. The prototype was developed for a French syllable- a TPC tool for French, which includes syllabic based text-to-phoneme task but it is portable to other lan- • knowledge of French (SAM-PA Version [4]) guages. a TPC check tool, which checks whether a • 1. Introduction phoneme string is a possible syllable for French or not. In Speech Synthesis as well as in Automatic Speech Recognition (ASR) the phonetic transcript of the ortho- The former was developed analogously to the P-TRA graphic word form is needed. Therefore, dictionaries are German system [3, 1](see section 2); the latter includes a generated which map the orthographic form to the canon- new method for checking whether a syllable is possible ical phonetic form of a word. Despite sounding easy this in a language or not. Of course, this is the main problem 10.21437/Interspeech.2004-480 is in fact quite difficult because the corresponding pho- for the TPC tool. netic transcripts have to be a hundred percent correct for In section 3 the problems for TPC in French are further use. shown. An analysis of French syllable structure is given In order to guarantee that there will not be errors in in section 4 and we describe our prototype for the French speech synthesis or ASR caused by an incorrect TPC all TPC checker in section 5. Finally, results and future work entries of the dictionaries have to be corrected manually are discussed in the last section. -avery time-consuming job. Of course, the better the TPC transcript the less cor- 2. Syllable-based TPC rection is needed, which, as a consequence minimizes the realtime factor of that work. 2.1. Basics One way to improve the TPC transcript and to mini- As a starting point, translating the orthographic form mize the realtime factor of the manual correction is to use of an utterance to its phonetically transcribed equivalent phonetic and linguistic knowledge within the TPC. may be done by a simple text-to-phoneme conversion, Former studies [1] on German show that there are nu- where each letter is translated into an IPA [6] or SAM- merous problems in mapping German orthography to a PA [5] symbol to obtain the canonic form of the word. high quality transcript without using any linguistic back- Each orthographic form has one standard phonetic ground information. reference which is called the standard phonetic or canonic Including knowledge of the morphological structure form of the word. of words [2] achieves a high accuracy but it has to be This conversion presents little or no difficulty in lan- 1In the here presented paper we mostly refer to speech synthesis. guages where the letters of the orthographic form map But using the TPC checker also in an ASR environment is possible one-to-one to the sounds of the canonic form. Languages which mostly map the letters one-to-one to the sounds are Contrary to our TPC for French presented here the for example Spanish, Italian and Portugese. German P-TRA needs the orthographic syllable struc- Other languages which adhere to complex orthogra- tures in the input to achieve good conversion results. phy like German, English or French exhibit a non-trivial conversion. Within this paper complex orthography is de- 3. TPC for French fined as an orthographic form which is based on etymo- logic word forms and therefore may differ to a greater or The TPC for French presented here was developed to lesser extent in the phonetic realization. automatically transcribe French orthographic words into their canonical form for synthesizing and recognition pur- 2.2. Including Phonotactics poses. In contrast to the German P-TRA, which derives syl- To deal with complex orthographic word forms we have labic information from the orthography we have to in- to know something about the phonotactics of the lan- clude it - as well as the intonation information - in the guage to convert. This is often done by including the canonic form. morphological structure of words but although it is highly acccurate for TPC the required morphological decompo- 3.1. The TPC-Tool sition is tedious and error-prone. The text-to-phoneme conversion is done in three main [1] describes an alternative approach where using the steps: orthographic syllable instead of morphemes is proposed to deal with the multiple choice of spellings for the same letter to phonemic conversion of the relevant phoneme in German. • sound. Here, syllabic information is included to As will be described below, the main problem is avoid ambiguity. the ambiguity of orthographic representation of the same phoneme in all languages which do not map letter to syllable structure of the canonic form: syllables sound one-to-one. • within the canonic form are marked by “ ”. | 2.3. TPC Problems in German - Some Examples main accent is set to the relevant syllable of the • canonic form. In German we have the problem of vowel lengthening vs. spoken /@/ or /h/ depending on the syllabic surrounding, The following paragraphs give a detailed description a problem which is easy to solve for human beings but on how the three steps are realized: impossible for a naive TPC system without any further TPC: Starting with a syllable-based conversion the description. system modifies orthographic representation in order to This problem is multiplied by the possibility of com- get the phonetic transcript. This is done as follows: pound words. In German it is possible to combine almost every word with another form to a new one. Consonant or The longest mapping orthographic searchstring is • vowel clusters typical for word boundaries are no longer changed into the phonetic equivalent using conver- spoken as in the isolated word. To solve this problem one sion rules. The searchstring can be the whole word has to consider the syllabic structure. or a syllabic structure. Another typical problem for German is the use of While converting this searchstring into its phonetic glottal stops before vowels at the beginning of a word • or after another vowel; this can be solved by taking the representation, syllable markers are set according syllabic structure of a word into account. to the syllabic structure. Note, that whole words However, having modelled these problems by sylla- are devided into syllables and therefore suffice the ble structure, there still exist exeptions, e.g. anglicisms syllabicility. in German. A TPC system has to model the “denglish” For those orthographic parts where no syllabic pronunciation instead of using the right native form in or- • structure is given yet, the system maps letter to der to obtain constistent results. This problem arises in sound one-to-one depending on French phonotac- all languages regardless of the transcription method. tics. For German we could use the P-TRA (Phonetische TRAnskription) TPC system [3] to translate text-to- Note as well, that we do not consider the environment phoneme. Syllable information is directly included in the syllables are found in like in P-TRA. rewrite rules of the form: Once the canonic form has been produced, “liai- son” information - information on how word boundaries left context orthographic searchstring should concatenate - is given according to [7]. We dis- right context =| > Phonemic correlate [1]. | tinguish between an obligatory “liaison” and an optional one. The latter may be included, the former must be in- progressive: “e instable”, after or in a “mot cluded. Besides the “liaison” there are word boundaries • phon´etique” which are never concatenated. This can be found in the rules as well. It is optional in any other case. Syllabic Structure: If the results of a conversion is known to be a syllable, the boundaries to the adjacent 5. TPC Check Tool for French elements are marked with “ ”. | In section 3.2 we showed some of the problems we have Accentuation: In French, normally the last syllable in the text-to-phoneme conversion which makes the TPC of a word is accentuated if its nucleus is not /@/. This result somewhat error prone. Including known French information is included in the canonic form. syllables in the TPC tool improves the result but we still have unspecified sequences which could consist of im- 3.2. TPC Problems in French possible phoneme concatenations or which are not seg- It is clear from the TPC description for French that there mented on syllabic basis. are problems - mainly of ambiguity - in the conversion Therefore, we analyzed the phonetic and syllable processing. structure of French and defined “possible” phonetic syl- Similarly to German there are ambiguities in vowel lables.