Issues in Text-To-Speech for French
Total Page:16
File Type:pdf, Size:1020Kb
ISSUES IN TEXT-TO-SPEECH FOR FRENCH Evelyne Tzoukermann AT&T Bell Laboratories 600 Mountain Avenue, Murray tlill, N.J. 07974 evelyne@rcsearch, art.corn Abstract in the standard International Phonetic Alphabi,t; the second column ASCII shows the ascii correspon- This paper reports the progress of the French dence of these characters for the text-to-speech text-to-speech system being developed at AT&T system, and the third column shows art example Bell Laboratories as part of a larger project for of the phoneme in a French word. multilingual text-to-speech systems, including lan- guages such as Spanish, Italian, German, Rus- Consonant s Vowels sian, and Chinese. These systems, based on di- IPA ASCII WORD IPA ASCII WORD phone and triphone concatenation, follow the gen- p p paix i i vive eral framework of the Bell Laboratories English t t tout e e the TTS system [?], [?]. This paper provides a de- k k eas e g aisc scription of the approach, the current status of the b b bas a a table French text-to-speech project, and some problems iI d dos u a time particular to French. g g gai 3 > homme m m mais o o tgt n n liOn u U boue 1 Introduction .p N gagner y y tour l 1 livre n ellX In this paper, the new French text-to-sIieech sys- f f faux ce @ seul tem being developed at AT&T is presented; sev- s s si o & peser eral steps have been already achieved while others f S chanter I bain are still in progress. First we present a brief' de- v v vive ~t A bane scription of the phonetic inventory of French, with z z zero 5 O bon a discussion of the approach used to select and 3 Z jupe de I brun segment phonetic units for the system. Methods r r rare o A samedi for automatic segmentation, and for the choice of Semi-vowels diphone and triphone units are presented. Some comments on durational and prosodic issues fol- IPA ASCII WORD low. We conclude with some discnssions on direc- j j yeux tions for fllture improw.'ment, including morpho- w w oui logical analysis, part-of-speech tagging, and par- q W huit tial phrasal analysis for the purpose of phrasal Table 1: French Phonetic Phonemes grouping. For the French text-to-speech synthesis system Phonetic Description of we use 35 phonemes, consisting of 17 consonants, 15 vowels (and not 1{3 like in the n,a cohlmn), and French 3 semi-vowels. As shown in Table 1, the fourth nasal /de/ has been removed, /07,/ and /g/ being The French phonetic system consists of 36 represented by the single phoneme /g/. The rea- phonemes, including 17 consonants, 16 vowels, sons for this change are that (1) /de/ tends to be and 3 semi-vowels. Table 1 shows the different assimilated to the phoneme/g/, and (2) this nasal phonemes; the IPA column contains the phonemes vowel occurs in very few words in French. Thus, 976 iC eould be said thai, functionally the disi, inel, ion eouix~xl of ol hor phon<mlc's, SylllrhOSiZiIlg separat.~ I)(;Cwoon i< 'l and is ininiinal. Prcneti also COil-. pIIOliONIOS ea.niioC (:a.pCllro ;trticllla.Cory aSl>ocCs of rains two ])holiOlilOS for 1,he eharaetor "a", /al and the languag(!. Ad(lil, ioually, transitions are harder /q/ , the first ouo hoing a front unrounded vowel Co modo.I I,han steady staCo.s. Thus, diphones are and the second one abael( romidod vowel. A small l&c standard minimal uniCs in segmental synCho iillliiboi: of l,'r<;n{:h spcal<crs lli;[ko this I)roduetion sis. Froln an acoustic stan(IpoinC, a diphollo (;ram ai<l<l i)<;i:C<~l)Cu~-d disl, hietion; in addiiAon, Coday's be seen as a signal passing from/,he co.Cral parC of tendency shows a dis;-q)t>caraii<:o of 1,his I)honeniie ~ !c)holmm('. Co the central pare of the sut>soquelHi disc.l,hieCion. Therefore, ouly /a/, the IliOsL {'orii-. ph<mcmo; iu oth<~r words, it is a unit oonllmSed liiOll t>holiellle of the Iwo> was roCahiod for s/nthc of Cwo half phonemo.s. At a sogmo.nt,al low:l, one sis. NoCiee thaC I,wo dilfcronC "sehwas" (or lilllto I;~)> cau Chink of a diphone as a sCored length el'st>etch ,,la,.kod as It+/and /A/wc,:o retaino{l for synl, hc ChaC goes fi:om nt'm: the target of one phonelne {tilt[ sis; sin<'.<esehwa in spokeil l"rcneh ca, it t)~, iu SOlliO cxCen<ls Co near I.he t;-trg~.'C of Cho followiug one, ia crises, prosollC or not dcpondiug 011 i, hc level of ocher word l.ho CransiCion [?]. fornlality of i&ilgu&ge it is iisot'ul Co ll~-wo Owe 'l'h<~ earliest diphono, systcln was <loscrihed hy dilfo=renC signs Co aeeounl for I.tiis option, l,l addi- I'oCcrson oC al [?]; ocher <liphono apl>roa<:hes have. tion, Cho graphcnio-I;o:pholieirie systelil IlSOd ill the been roi>orC<xl by [?], [?], [?], an<t [?]. AlChough Vronch TTS sysColn and dose.ribod hi SocCion ??, there are only about 40 phonemo.s in/"nglish al)out is o=quipp<;d wiCh the Cal>at)ility of ineh.ling or .ot 1600 diphonos sulfieo= for synthesis. Nev<;rthe I, Ilc schwa <lol>on<ling on the lc'w'.l ot' language. For less) b('.eaaise of lllllNerOLlS allophono.s and the face ex~-~inplo> Clio sonl;onco "jo Ill'Oil V~-tiss;uncdi", I ai'i'l that some dil>hones are not really conCexC floe, re' h:auiny on saturday, (;21.II lie said <tither/3,) lll('l Vg searchers like I'ctcrsou suggesl, that, aboul. 8000 samc)<li/ or, liioro eolloquially, /:,;mh ve samdi/ , <tiphoHes are nce<t<xl for high quality <liphone syn.- dot)ondiug on whether the schwa ix reduced or noC. thesis. Moreover> the vowel diphtongs in gnglistl In olir systoiii, l;ho solil,(;llCO will t>o I,ra, nseribcd t:+/x could be trcato.d as peudo-diphones, l,'arly Iq'cneh HI('I Vg sanlAdi/, A ;-t(;eOlllitiilg; for the Cl';tce of synthesis systems [?] relied also on sym, hesis by the schwa. An ad<liCioual eilaraccer "*", was uso<l diphouos exc<'pt for the. diphone [qi] that is into to r{qpro.sent silences aC the Iwginlling and end of gratc'{l in a Cril)honi<: group. This phonemic pair WOl'(ts, was sCore<l diff<,rontly hoeauso of its high fr<!qu<mcy Ig'onch Idlouenies (:au also he viewed ac<:ord iu lg'onch in oe<:urrcnces such as "hii" him~her. In itig t(> their Sl)OeCfal variabilil.y iu the eont,oxC of lnoro recent work, systelliS (;olltaiil diphonos and oCher i>honoliiOS, li, is knowil thaC l,'ronch vowels larger units, such as Cril>hones , quadriphonos> and show spectral stability ;MIll low c()llt(~XCllltl vari evol, q,,intophonos [?] [?], iu order to capture ahility [?], [?]. '1't1{; voiceless f,.icaCivos show some- eoarticu[a.tory lihononio.na of a longer domain that wiled; less spoeCra.l sCal>iliCy, I;tioai Chc plosives. The would iloli be adequately irio<l<'.lcd in a stric.tly di- nasals and voieod fricatives present ow!n less sCa= ]>honic system. hility. Ifi<luids l/l/ and/r/)and semi vowels l/j/, lu the current sysCem, the dil>hone invcutory for /w/, /q/) arc the i>ho,~omcs showing high vari- lq'ench was built by taking 35 ~ phonernic pairs, ahiliCy a,n(l this poses prot>hmis in diphono hasod Chat is 1225 ilnits. Ad<lod Co that was Clio silence synl;hosis [?]. Liquids ai'o very scrisiCive Co lh<='ir symbol in initial and final position, which adds eolltoxC; forinaAiC strllei.;tlres show subsCanl, ial cf a,lioChor 70 phoneniic [)aii:s, [gl'OIH this iniCial sol;, fects of c.oart,icuhd.ioti. As for the s<;mi-vowols, il. l, he pairs of se.lni-vowels wcrc relnow;d. All the is ditliculC I.o ot~t)Clll'O Che ZOllO of spec.tral stability. ottior <x)mt)inations were kept. Even though all of For those' reasons, some researchers, o.g. [?], th('.Ill do llOt oecllr ill French lexical strueCure, they orgauizc l)iionernie classi(i<:ation using Che crit<;- <:a. still app<!ar in tile intcr-wor<l boundaries. For ria of the stable vs unsi;ablc phone.me raChor than oxaml>lc , the sequence /lr/ is not permiCted word place of arCieulation. Sinii]ar to Clio approach in internally, but imist be handled since it appears Ill<'. l']nglish TTS sysCerii, syi'lthesis for French is in the interwor<l assimilation in /val r.jc/ "valenC (tolie using f>restorc'<] liilil,s. Within this frainc rion" cost 'n,othiny. This is partieularly iinportant work, there are various stralo<~gies for 1,he colh'.o-- in French sin<:e inter-word liaison is comnion as l,ion of uniCs, units i,hat will then eonsl, iCui;e the in /el z 5/ "ell<;s ont" they have vs /el s5/ "olios dicl, ionary of polyphonos. 1)lie to Chc eoil{inuo.[ sont" they are, whero the final consonant/s/eithor a.spe<:l; of the speech signal and tile fact chaC the undergo0s liaison wiC]l the vowo,1 /5/ rosulting in lt&Cllro of l)honenies is greatly modified in the /z/, or undergoes linking with the consonani, ts/ 977 resulting in the devoiced sibilant.