Constructing a Multilingual Phoneme List for Polyglot Speech Synthesiser

Constructing a Multilingual Phoneme List for Polyglot Speech Synthesiser Nur-Hana Samsudin, Mark Lee School of Computer Science University of Birmingham United Kingdom {n.h.samsudin, m.g.lee}@cs.bham.ac.uk Abstract We describe our approach to construct a phoneme set for polyglot speech synthesis. In polyglot speech synthesis, resources are shared across languages. The goal of this research is to develop global phoneme set using existing resources. Therefore, MBROLA has been selected. In MBROLA, there are 72 diphone databases of different languages. For each database, there is a set of phonemes used. We have selected 31 language databases out of the 72 diphone databases in MBROLA. By reusing existing resources, we would be able to gather global phoneme set in faster and wider language coverage. Therefore it would be able to be used for language that has limited linguistic expertise or limited linguistics resources. Our approach includes the process of extracting the phonemes of these languages, clustering, eliminating and substituting inaccurate phonemes and finally evaluating the list of phonemes obtained. At the end of this study, we are able to come out with one complete list of a global phoneme set. This list can be use as a substitution for unavailable phonemes in future polyglot TTS systems. It is also suitable to be used as a default phoneme set for new languages if the new languages’ phoneme set is not yet defined. Keywords: polyglot TTS, multilingual, speech synthesis, global phoneme set, resource poor languages 1. Introduction For the scope of our research, we use SAMPA as a phonetic representation given that is what MBROLA uses The International Phonetic Alphabet (IPA) is a standard (MBROLA, 2005). MBROLA is only used as the representation of phonetics for all languages. It provides resource. SAMPA is one of the complete phonetic symbols to represent sounds for phonemes which are representation which follows IPA transcription closely. already listed in a language or that are possible to be The reason why we choose MBROLA is that MBROLA produced by human articulatory system. Therefore, the already has a rich collection of phonemes sets for 31 question needs to be addressed: why there is a need for languages (MBROLA, 2005) - with a few variations for constructing a multilingual phoneme list for polyglot some languages. Making use of available resources not speech synthesis? only makes the work possible without linguistic expertise IPA provides a generic concept and instances of speech but also makes the standardising and identifying process sound. While in speech, not all sounds listed in IPA are faster. used regularly. Based on this account, it is possible to Standard phonetic notation distinguishes the different have a speech synthesis system for resource poor phonemes used in pronunciation. In polyglot speech languages in which the phonemes set is obtained based on synthesis, the resources must be put together other languages. And since polyglot TTS facilitate (Romsdorfer, 2007) and selection done based on sharable data including phonetic resources, the concept of phoneme labelling. By using a standard notation, one multilingual phonemes set will complement the phoneme will also not be mistaken as another. By having implementation of the polyglot TTS. a standard representation, it is also possible to reuse the In one previous study on multilingual phonemes by phonemes of selected languages in other speech Altosaar et al. (1996), an intermediary representation was applications or phonetics related work. introduced. The representation able to convert from IPA, This paper is organised as follows. In Section 2, we will SAMPA, X-SAMPA, TIMIT and other notations into discuss the process of creating a multilingual phoneme their notation, called Wordbet. The Worldbet list not only list. We will describe the extraction and analysis process. covers IPA transcription (Hieronymus,1993), but also We will also explain some of the issues we encounter suggests symbols based upon five years study conducted during the analysis phase. We will then provide the on a speech database. Therefore, knowledge on the sound outcome of our research in the multilingual phoneme list of different language needs to be obtained before subsection. In Section 3, we will compare the outcome of algorithm for symbol mapping from one notation to our research with the phonemes set used by linguists in another can be constructed. This is different to our the following languages: English (RP), Latin, Italian, approach where the global phoneme set is constructed German and French. This will be followed with with limited resources. discussion and conclusion. The aim of constructing a multilingual phoneme list is not to substitute IPA, SAMPA or X-SAMPA. What we propose is a default or initial phoneme set which can be 2. Constructing a Multilingual Phoneme List used in polyglot TTS architecture or TTS for resource There are three processes in phoneme list construction; poor languages. The global phoneme set obtained at the Extraction, Analysis and Evaluation. In Extraction end can be use independently. process, all phonemes from the MBROLA database are retrieved. For the scope of this research, we collected all phonemes in each language that are available in the is vowels, diphtongs and consonants. For each cluster, MBROLA database. In Analysis, there are two parts: the there are also derived phonemes. Some variations which clustering process and the elimination/substitution occur most frequently are the lengthening of vowels, process. In Evaluation, the result of this study is germination of consonants and aspirated of consonants as compared with validated phoneme set of the stated well as palatalised phonemes. We also have clusters of languages. similar sounds. For example, [r], [R] and [4] or in IPA There are 31 languages listed in MBROLA: Afrikaans, are: [r], [ ʁ ] and [ ɾ ] correspondingly are clustered Dutch, English, German, Icelandic, Swedish, French, together. This second level of clustering is based on our Italian, Latin, Romanian, Spanish, Croatian, Czech, own judgement that these phonemes are coming from Lithuanian, Polish, Breton, Farsi, Greek, Hindi, Estonian, similar sound group with each other; but not in term of Hungarian, Arabic, Hebrew, Japanese, Korean, Turkish, the manner or the place or articulation. By having the Indonesian, Malay, Maori and Telegu. All the phonemes second level cluster we can determine the possible use the SAMPA notation. substitution phoneme from similar cluster during synthesising process. 2.1. Extraction Based on the phonemes of the 31 languages which have 2.3. Elimination and Substitution been extracted, initially there are 357 unique SAMPA From the clustered phonemes, we are capable of symbols. In the list, we notice that the phonemes can be determining the SAMPA phonemes which are correspond classified into two: the basic phonemes, which the to the standard IPA notation. At this phase, there could be phoneme is a direct mapping from consonants and vowel two reasons a phoneme need to be substituted or of the IPA; and the derived phonemes, where the eliminated. It could either be the phoneme is not a phoneme is an entity constructed based on the standard SAMPA symbol or the SAMPA symbol given is combination of basic phoneme and diacritics or/and not corresponding to the sound produced by the phoneme suprasegmentals symbols. in MBROLA. When the phoneme is not written in Before we go into greater detail, it is necessary to standard SAMPA notation, it could be either one of these describe the different types of classification in IPA. two reasons: the symbol was represented to fit in all Symbols in IPA are classified into consonants, vowels, phonemes of the target language and somehow the diacritics, suprasegmentals, and tones and word accents. created symbol is clashing with another SAMPA symbol However, in SAMPA, as described by Wells (2003), the or the symbol simply does not exist in SAMPA. When tones and word accents need to be labelled in different the SAMPA sound does not correspond to the sound tier (isolated from phoneme tier). This issue is beyond the played by MBROLA synthesiser, it means that error may scope of this paper. occur during the matching process between orthographic Therefore, according to the IPA chart, the phoneme can and phonetic or the developer is using a different version either belong to one of the following categories: vowels, of SAMPA standard. consonant pulmonic, consonant non-pulmonic and other Based on these criteria, the elimination or substitution symbols. The phoneme could also has diacritics and/or process will be carried out. Elimination is the process suprasegmentals symbol. In the list, diphtongs are not required when one of the following conditions occurs: listed. This is understandable because diphthongs consist • the phoneme does not match to any IPA transcription of two consecutive vowels that glide or assimilate with • the symbol does not exist in SAMPA notation one another in the production to become a phoneme. • the sound which is labelled in the MBROLA Contrary to the IPA chart, we classify our phonemes database is not possible to be match with any other quite differently in our analysis. We treat all vowel and unused symbol for that particular language. consonant (both pulmonic and non-pulmonic) as an entity Substitution on the other hand is the process of of our phonemes. However, we also treat diphthongs as changing the symbol declared in MBROLA into one that an entity. We also have instances of derived phonemes matches with IPA and SAMPA. We will provide which are a combination of a consonant or a vowel with examples in later section when we discuss specific diacritics or/and suprasegmentals values in which we also languages’ issues. treated the combined attributes as a phoneme entity. It is It is also important to highlight that we remove semi- important to retain the derived phonemes because the diphtongs (or also known as mixed-diphtongs) in which phone produced has a unique sound as compared to the vowels /a/, /e/ /i/ and /u/ are followed by /r/, /l/, / ļ/ or /m/.

Load more