Issues in Text-To-Speech for French

Issues in Text-To-Speech for French

ISSUES IN TEXT-TO-SPEECH FOR FRENCH Evelyne Tzoukermann AT&T Bell Laboratories 600 Mountain Avenue, Murray tlill, N.J. 07974 evelyne@rcsearch, art.corn Abstract in the standard International Phonetic Alphabi,t; the second column ASCII shows the ascii correspon- This paper reports the progress of the French dence of these characters for the text-to-speech text-to-speech system being developed at AT&T system, and the third column shows art example Bell Laboratories as part of a larger project for of the phoneme in a French word. multilingual text-to-speech systems, including lan- guages such as Spanish, Italian, German, Rus- Consonant s Vowels sian, and Chinese. These systems, based on di- IPA ASCII WORD IPA ASCII WORD phone and triphone concatenation, follow the gen- p p paix i i vive eral framework of the Bell Laboratories English t t tout e e the TTS system [?], [?]. This paper provides a de- k k eas e g aisc scription of the approach, the current status of the b b bas a a table French text-to-speech project, and some problems iI d dos u a time particular to French. g g gai 3 > homme m m mais o o tgt n n liOn u U boue 1 Introduction .p N gagner y y tour l 1 livre n ellX In this paper, the new French text-to-sIieech sys- f f faux ce @ seul tem being developed at AT&T is presented; sev- s s si o & peser eral steps have been already achieved while others f S chanter I bain are still in progress. First we present a brief' de- v v vive ~t A bane scription of the phonetic inventory of French, with z z zero 5 O bon a discussion of the approach used to select and 3 Z jupe de I brun segment phonetic units for the system. Methods r r rare o A samedi for automatic segmentation, and for the choice of Semi-vowels diphone and triphone units are presented. Some comments on durational and prosodic issues fol- IPA ASCII WORD low. We conclude with some discnssions on direc- j j yeux tions for fllture improw.'ment, including morpho- w w oui logical analysis, part-of-speech tagging, and par- q W huit tial phrasal analysis for the purpose of phrasal Table 1: French Phonetic Phonemes grouping. For the French text-to-speech synthesis system Phonetic Description of we use 35 phonemes, consisting of 17 consonants, 15 vowels (and not 1{3 like in the n,a cohlmn), and French 3 semi-vowels. As shown in Table 1, the fourth nasal /de/ has been removed, /07,/ and /g/ being The French phonetic system consists of 36 represented by the single phoneme /g/. The rea- phonemes, including 17 consonants, 16 vowels, sons for this change are that (1) /de/ tends to be and 3 semi-vowels. Table 1 shows the different assimilated to the phoneme/g/, and (2) this nasal phonemes; the IPA column contains the phonemes vowel occurs in very few words in French. Thus, 976 iC eould be said thai, functionally the disi, inel, ion eouix~xl of ol hor phon<mlc's, SylllrhOSiZiIlg separat.~ I)(;Cwoon i< 'l and is ininiinal. Prcneti also COil-. pIIOliONIOS ea.niioC (:a.pCllro ;trticllla.Cory aSl>ocCs of rains two ])holiOlilOS for 1,he eharaetor "a", /al and the languag(!. Ad(lil, ioually, transitions are harder /q/ , the first ouo hoing a front unrounded vowel Co modo.I I,han steady staCo.s. Thus, diphones are and the second one abael( romidod vowel. A small l&c standard minimal uniCs in segmental synCho iillliiboi: of l,'r<;n{:h spcal<crs lli;[ko this I)roduetion sis. Froln an acoustic stan(IpoinC, a diphollo (;ram ai<l<l i)<;i:C<~l)Cu~-d disl, hietion; in addiiAon, Coday's be seen as a signal passing from/,he co.Cral parC of tendency shows a dis;-q)t>caraii<:o of 1,his I)honeniie ~ !c)holmm('. Co the central pare of the sut>soquelHi disc.l,hieCion. Therefore, ouly /a/, the IliOsL {'orii-. ph<mcmo; iu oth<~r words, it is a unit oonllmSed liiOll t>holiellle of the Iwo> was roCahiod for s/nthc of Cwo half phonemo.s. At a sogmo.nt,al low:l, one sis. NoCiee thaC I,wo dilfcronC "sehwas" (or lilllto I;~)> cau Chink of a diphone as a sCored length el'st>etch ,,la,.kod as It+/and /A/wc,:o retaino{l for synl, hc ChaC goes fi:om nt'm: the target of one phonelne {tilt[ sis; sin<'.<esehwa in spokeil l"rcneh ca, it t)~, iu SOlliO cxCen<ls Co near I.he t;-trg~.'C of Cho followiug one, ia crises, prosollC or not dcpondiug 011 i, hc level of ocher word l.ho CransiCion [?]. fornlality of i&ilgu&ge it is iisot'ul Co ll~-wo Owe 'l'h<~ earliest diphono, systcln was <loscrihed hy dilfo=renC signs Co aeeounl for I.tiis option, l,l addi- I'oCcrson oC al [?]; ocher <liphono apl>roa<:hes have. tion, Cho graphcnio-I;o:pholieirie systelil IlSOd ill the been roi>orC<xl by [?], [?], [?], an<t [?]. AlChough Vronch TTS sysColn and dose.ribod hi SocCion ??, there are only about 40 phonemo.s in/"nglish al)out is o=quipp<;d wiCh the Cal>at)ility of ineh.ling or .ot 1600 diphonos sulfieo= for synthesis. Nev<;rthe I, Ilc schwa <lol>on<ling on the lc'w'.l ot' language. For less) b('.eaaise of lllllNerOLlS allophono.s and the face ex~-~inplo> Clio sonl;onco "jo Ill'Oil V~-tiss;uncdi", I ai'i'l that some dil>hones are not really conCexC floe, re' h:auiny on saturday, (;21.II lie said <tither/3,) lll('l Vg searchers like I'ctcrsou suggesl, that, aboul. 8000 samc)<li/ or, liioro eolloquially, /:,;mh ve samdi/ , <tiphoHes are nce<t<xl for high quality <liphone syn.- dot)ondiug on whether the schwa ix reduced or noC. thesis. Moreover> the vowel diphtongs in gnglistl In olir systoiii, l;ho solil,(;llCO will t>o I,ra, nseribcd t:+/x could be trcato.d as peudo-diphones, l,'arly Iq'cneh HI('I Vg sanlAdi/, A ;-t(;eOlllitiilg; for the Cl';tce of synthesis systems [?] relied also on sym, hesis by the schwa. An ad<liCioual eilaraccer "*", was uso<l diphouos exc<'pt for the. diphone [qi] that is into to r{qpro.sent silences aC the Iwginlling and end of gratc'{l in a Cril)honi<: group. This phonemic pair WOl'(ts, was sCore<l diff<,rontly hoeauso of its high fr<!qu<mcy Ig'onch Idlouenies (:au also he viewed ac<:ord iu lg'onch in oe<:urrcnces such as "hii" him~her. In itig t(> their Sl)OeCfal variabilil.y iu the eont,oxC of lnoro recent work, systelliS (;olltaiil diphonos and oCher i>honoliiOS, li, is knowil thaC l,'ronch vowels larger units, such as Cril>hones , quadriphonos> and show spectral stability ;MIll low c()llt(~XCllltl vari evol, q,,intophonos [?] [?], iu order to capture ahility [?], [?]. '1't1{; voiceless f,.icaCivos show some- eoarticu[a.tory lihononio.na of a longer domain that wiled; less spoeCra.l sCal>iliCy, I;tioai Chc plosives. The would iloli be adequately irio<l<'.lcd in a stric.tly di- nasals and voieod fricatives present ow!n less sCa= ]>honic system. hility. Ifi<luids l/l/ and/r/)and semi vowels l/j/, lu the current sysCem, the dil>hone invcutory for /w/, /q/) arc the i>ho,~omcs showing high vari- lq'ench was built by taking 35 ~ phonernic pairs, ahiliCy a,n(l this poses prot>hmis in diphono hasod Chat is 1225 ilnits. Ad<lod Co that was Clio silence synl;hosis [?]. Liquids ai'o very scrisiCive Co lh<='ir symbol in initial and final position, which adds eolltoxC; forinaAiC strllei.;tlres show subsCanl, ial cf a,lioChor 70 phoneniic [)aii:s, [gl'OIH this iniCial sol;, fects of c.oart,icuhd.ioti. As for the s<;mi-vowols, il. l, he pairs of se.lni-vowels wcrc relnow;d. All the is ditliculC I.o ot~t)Clll'O Che ZOllO of spec.tral stability. ottior <x)mt)inations were kept. Even though all of For those' reasons, some researchers, o.g. [?], th('.Ill do llOt oecllr ill French lexical strueCure, they orgauizc l)iionernie classi(i<:ation using Che crit<;- <:a. still app<!ar in tile intcr-wor<l boundaries. For ria of the stable vs unsi;ablc phone.me raChor than oxaml>lc , the sequence /lr/ is not permiCted word place of arCieulation. Sinii]ar to Clio approach in internally, but imist be handled since it appears Ill<'. l']nglish TTS sysCerii, syi'lthesis for French is in the interwor<l assimilation in /val r.jc/ "valenC (tolie using f>restorc'<] liilil,s. Within this frainc rion" cost 'n,othiny. This is partieularly iinportant work, there are various stralo<~gies for 1,he colh'.o-- in French sin<:e inter-word liaison is comnion as l,ion of uniCs, units i,hat will then eonsl, iCui;e the in /el z 5/ "ell<;s ont" they have vs /el s5/ "olios dicl, ionary of polyphonos. 1)lie to Chc eoil{inuo.[ sont" they are, whero the final consonant/s/eithor a.spe<:l; of the speech signal and tile fact chaC the undergo0s liaison wiC]l the vowo,1 /5/ rosulting in lt&Cllro of l)honenies is greatly modified in the /z/, or undergoes linking with the consonani, ts/ 977 resulting in the devoiced sibilant.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us