United States Patent (19) 11 Patent Number: 5,930,754 Karaali Et Al
Total Page:16
File Type:pdf, Size:1020Kb
USOO593.0754A United States Patent (19) 11 Patent Number: 5,930,754 Karaali et al. (45) Date of Patent: Jul. 27, 1999 54 METHOD, DEVICE AND ARTICLE OF OTHER PUBLICATIONS MANUFACTURE FOR NEURAL-NETWORK BASED ORTHOGRAPHY-PHONETICS “The Structure and Format of the DARPATIMIT CD-ROM TRANSFORMATION Prototype”, John S. Garofolo, National Institute of Stan dards and Technology. 75 Inventors: Orhan Karaali, Rolling Meadows; “Parallel Networks that Learn to Pronounce English Text' Corey Andrew Miller, Chicago, both Terrence J. Sejnowski and Charles R. Rosenberg, Complex of I11. Systems 1, 1987, pp. 145-168. 73 Assignee: Motorola, Inc., Schaumburg, Ill. Primary Examiner David R. Hudspeth ASSistant Examiner-Daniel Abebe 21 Appl. No.: 08/874,900 Attorney, Agent, or Firm-Darleen J. Stockley 22 Filed: Jun. 13, 1997 57 ABSTRACT A method (2000), device (2200) and article of manufacture (51) Int. Cl. .................................................. G10L 5/06 (2300) provide, in response to orthographic information, 52 U.S. Cl. ............... 704/259; 704/258; 704/232 efficient generation of a phonetic representation. The method 58 Field of Search ..................................... 704/259,258, provides for, in response to orthographic information, effi 704/232 cient generation of a phonetic representation, using the Steps of inputting an Orthography of a word and a predetermined 56) References Cited Set of input letter features, utilizing a neural network that has U.S. PATENT DOCUMENTS been trained using automatic letter phone alignment and predetermined letter features to provide a neural network 4,829,580 5/1989 Church. 5,040.218 8/1991 Vitale et al.. hypothesis of a word pronunciation. 5,668,926 9/1997 Karaali et al. .......................... 704/259 5,687,286 11/1997 Bar-Yam ................................. 704/259 61 Claims, 14 Drawing Sheets ORTHOGRAPHY INPUT LETTER 22O2 FEATURES 2204 MICROPROCESSOR/ APPLICATION-SPECIFIC INTEGRATED CIRCUIT/ COMBINATION MICROPROCESSOR AND APPLICATION-SPECIFIC INTEGRATED CIRCUIT 2208 ENCODER AUTOMATIC LETTER PHONE ALIGNMENT PRETRAINED ORTHOGRAPHY PRONUNCIATION NEURAL NETWORK PREDETERMINED LETTER FEATURES 2214 NEURAL NETWORK HYPOTHESIS OF A WORD PRONUNCIATION 2200 2216 U.S. Patent Jul. 27, 1999 Sheet 1 of 14 5,930,754 TEXT IN ORTHOGRAPHIC FORM 102 104 LINGUISTIC MODULE PHONETIC REPRESENTATION 106 108 ACOUSTIC MODULE SPEECH 110 100 A77G. Z -PRIOR ART ORTHOGRAPHY-PHONETICS LEXICON 202 204 NEURAL NETWORK INPUT CODING 206 NEURAL NETWORK TRAINING 208 FEATURE-BASED ERROR BACKPROPAGATION 200 A/VG.2 U.S. Patent Jul. 27, 1999 Sheet 2 of 14 5,930,754 TEXT 302 304 LINGUISTIC PREPROCESSING MODULE 320 306 ORTHOGRAPHY-PRONUNCIATION LEXICON PRONUNCIATION 308 DETERMINATION YES 318 NEURAL NETWORK ORTHOGRAPHY PRONUNCIATION CONVERTER POSTLEXICAL MODULE ACOUSTIC 314 MODULE SPEECH 316 300 A/VG.3 U.S. Patent Jul. 27, 1999 Sheet 3 of 14 5,930,754 ORTHOGRAPHY 402- STREAM 2 coat Niger, 3 if- 404Izo N u *co Tow atT-N STREAMif 1 PRONUNCIATION408 - / 10 164022 k ow. It 400 A77G.4 LOCATION 1 2 3 LETTER Is Ic h PHONE is k + uw location 1.27602 SEQUENCE 1 in du Is It + SEQUENCE 2 n + + + t e 600 A/VG.. 6 -PRIOR ART U.S. Patent Jul. 27, 1999 Sheet 4 of 14 5,930,754 ORTHOGRAPHY 1 c. 702 LETTER FEATURES FOR c 710 --OBSTRUENT --CONTINUANT ALVEOLAR VELAR +ASPIRATED LETTER FEATURES FOR O 712 --VOCALIC --WOWEL --SONORANT --CONTINUANT --BACK1 --BACK2 --LOW --LOW2 +VOICED --LONG +MID-LOW2 --ROUND1 --ROUND2 --MID-HIGH1 +MID-HIGH2 LETTER FEATURES FOR O 714 LETTER FEATURES FOR t 716 --OBSTRUENT --ALVEOLAR ASPIRATED --CONTINUANT +DENTAL --VOICED STREAM U.S. Patent Jul. 27, 1999 Sheet 5 of 14 5,930,754 52%.3%32 IP 800 AG, A/G.9 r ae p ih d HYPOTHESES U.S. Patent Jul. 27, 1999 Sheet 6 of 14 5,930,754 PHONE OX TARGET (dk) HYPOTHESIS (o) o 1 LOCAL ERROR 1 -1 (dk-ok) (dk-ok)? 1 (d-90°)-2 1100 A/VG. ZZ -PRIOR ART PHONE h OX TARGET (d) HYPOTHESIS (o 1 LOCAL ERROR 1(M=1) -.1(M=.1) M+(dk-ok) (M+(d-o))? | 1 .2121 1200 A77G. Z2 U.S. Patent Jul. 27, 1999 Sheet 7 of 14 5,930,754 101 201 4 504 103203 1316 106206 1302 H2,532 STREAM 3 STREAM 2 INPUT 127 INPUT EER 32 228 LETTERS FEATURES 15025,306 15327 36 126525 331 41 123219 1304 1318 INPUT INPUT BLOCK 1 BLOCK 2 1306 SIGMOID SIGMOID - 1320 NEURAL NEURAL NETWORK NETWORK BLOCK 5 BLOCK 4 1322 1310 -1- N 1314 STREAM 1 SOFTMAX SOFTMAX SOFTMAX SOFTMAX NEURAL NEURAL NEURAL NEURAL OUTPUT 1016 4022 NETWORK NETWORK NETWORK NETWORK PHONES BLOCK 6 BLOCK 7 BLOCK 5 BLOCK 8 3. OUTPUT BLOCK 9 1324 1326 1300 NEURALHYPOTHESIS NETWORK to saola AVGZ.3 U.S. Patent Jul. 27, 1999 Sheet 8 of 14 5,930,754 ORTHOGRAPHY 14O2 1404 NEURAL NETWORK INPUT CODING 1408 PRONUNCIATION 1410 1400 A77G. 74 1502 C t - 1504 1500 A/VG. Zao 1602 N 1604 k low + i t . Na 1606 k low t 1600 A77G. Z6 U.S. Patent Jul. 27, 1999 Sheet 9 of 14 5,930,754 101.201. 4 102.202304 1716 EE106 is 2,332 1702 STREAM 3 1292227 STREAM 2 INPUT 341 INPUT 3 15 120 LETTER 32 LETTERS FEATURES 150153 306 36 E2.22,331 250 124220 350 1704 1718 INPUT INPUT BLOCK 1 BLOCK 2 SIGMOID SIGMOID 1706 NEURAL NEURAL 1720 NETWORK NETWORK BLOCK 3 BLOCK 4 SOFTMAX SOFTMAX SOFTMAX SOFTMAX NEURAL NEURAL NEURAL NEURAL NETWORK NETWORK NETWORK NETWORK BLOCK 7 BLOCK 6 BLOCK 5 BLOCK 8 40 OUTPUT BLOCK 9 1722 1724 1700 NEURALHYPOTHESIS NETWORK to 64022 AzVGZ27 U.S. Patent Jul. 27, 1999 Sheet 11 of 14 5,930,754 4 102.202304 103203i.05 1916 206 3 3 2 1902 STREAM 3 1 2 7 ;; 3 4. 1 INPUT 228 STREAM 2 LETTER INPUT 3 15 20 FEATURES 2 5 O 6 LETTERS 3.3 2 1 7 3 3 1 1904 INPUT . 3 5 O BLOCK 1 1 2 4 2 2 O 1920-N SIGMOID NEURAL 1906 MOID NEURAL NETWORK BLOCKO 4 NETW ORK BLOCK 3 SO-5 ---><><-N SOFTMAX SOFTMAX SOFTMAX SOFTMAX NEURAL NEURAL NEURAL NEURAL NETWORK NETWORK NETWORK NETWORK BLOCK 5 BLOCK 6 BLOCK 7 BLOCK 8 1908 PHONE 2 FEATURES 1922 SOFTMAX SOFTMAX SEAM,PHONES 10 164022NEANETWORK NETWORKNEA NETWORKNEURAL NETWORKNEURAL BLOCK 9 BLOCK 10 BLOCK 11 BLOCK 12 1932 1926 Sita1930 1924 1934 1900 NEURALNoTESI" NETWORK 10 164022 AZVGZ.9 U.S. Patent Jul. 27, 1999 Sheet 12 of 14 5,930,754 A77G2O 2000 12/14 INPUTTING AN ORTHOGRAPHY OF A WORD AND A 2002 PREDETERMINED SET OF INPUT LETTER FEATURES UTILIZING A NEURAL NETWORK THAT HAS BEEN 2004 TRAINED USING AUTOMATIC LETTER PHONE ALIGNMENT AND PREDETERMINED LETTER FEATURES TO PROVIDE A NEURAL NETWORK HYPOTHESIS OF A WORD PRONUNCIATION A77G.2Z 2100 2102 PROVIDING A PREDETERMINED NUMBER OF LETTERS OF AN ASSOCIATED ORTHOGRAPHY CONSISTING OF LETTERS FOR THE WORD AND A PHONETIC REPRESENTATION CONSISTING OF PHONES FOR A TARGET PRONUNCIATION OF THE ASSOCIATED ORTHOGRAPHY 2104 ALIGNING THE ASSOCIATED ORTHOGRAPHY AND PHONETIC REPRESENTATION USING A DYNAMIC PROGRAMMING ALIGNMENT ENHANCED WITH A FEATURALLY-BASED SUBSTITUTION COST FUNCTION 2106 PROVIDING ACOUSTIC AND ARTICULATORY INFORMATION CORRESPONDING TO THE LETTERS, BASED ON A UNION OF FEATURES OF PREDETERMINED PHONES REPRESENTING EACH LETTER 2108 PROVIDING A PREDETERMINED AMOUNT OF CONTEXT INFORMATION 2110 TRAINING THE NEURAL NETWORK TO ASSOCIATE THE INPUT ORTHOGRAPHY WITH A PHONETIC REPRESENTATION 2172 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -?:- - - PROVIDING A PREDETERMINED NUMBER OF LAYERS OF OUTPUT REPROCESSING IN WHICH PHONES, NEIGHBORING PHONES, PHONE FEATURES AND NEIGHBORING PHONE FEATURE ARE PASSED TO SUCCEEDING LAYERS EMPLOYING A FEATURE-BASED ERROR FUNCTION TO CHARACTERIZE A DISTANCE BETWEEN TARGET AND HYPOTHESIZED PRONUNCIATIONS DURING TRAINING U.S. Patent Jul. 27, 1999 Sheet 13 of 14 5,930,754 ORTHOGRAPHY 13/14 INPUT LETTER 22O2 FEATURES 2204 MICROPROCESSOR/ APPLICATION-SPECIFIC INTEGRATED CIRCUIT/ COMBINATION MICROPROCESSOR AND APPLICATION-SPECIFIC INTEGRATED CIRCUIT 2208 ENCODER AUTOMATIC LETTER PHONE ALIGNMENT PRETRAINED ORTHOGRAPHY PRONUNCIATION NEURAL NETWORK PREDETERMINED LETTER FEATURES 2214 NEURAL NETWORK HYPOTHESIS OF A WORD 2200 PRONUNCIATION w 2216 A77G.22 ORTHOGRAPHY PREDETERMINED SET OF OF A WORD INPUT LETTER FEATURES 2302 2,304 ARTICLE OF MANUFACTURE 2308 2312 INPUTTING UNIT AUTOMATIC LETTER PHONE ALIGNMENT NEURAL NETWORK UTILIZATION UNIT PREDETERMINED LETTER FEATURES 2,314 NEURAL NETWORK HYPOTHESIS OF A WORD 2300 PRONUNCIATION 2316 A77G.23 U.S. Patent Jul. 27, 1999 Sheet 14 of 14 5,930,754 2404 PRONUNCIATION LEXICON 2402 UNTRAINED NEURAL NETWORK 2408 TRAINED NEURAL NETWORK 2410 PART OF PRONUNCIATION LEXICON 2400 A77G.24 5,930,754 1 2 METHOD, DEVICE AND ARTICLE OF of the phonetic featural content of orthography, and that is MANUFACTURE FOR NEURAL-NETWORK evaluated, and whose error is backpropagated, on the basis BASED ORTHOGRAPHY-PHONETICS of the featural content of the generated phones. A method, TRANSFORMATION device and article of manufacture for neural-network based orthography-phonetics transformation is needed. FIELD OF THE INVENTION BRIEF DESCRIPTION OF THE DRAWINGS The present invention relates to the generation of phonetic FIG. 1 is a Schematic representation of the transformation forms from orthography, with particular application in the of text to speech as is known in the art. field of Speech Synthesis. FIG. 2 is a Schematic representation of one embodiment 1O of the neural network training proceSS used in the training of BACKGROUND OF THE INVENTION the orthography-phonetics converter in accordance with the As shown in FIG. 1, numeral 100, text-to-speech synthe present invention. sis is the conversion of written or printed text (102) into FIG. 3 is a schematic representation of one embodiment speech (110). Text-to-speech synthesis offers the possibility 15 of the transformation of text to Speech employing the neural of providing voice output at a much lower cost than record network Orthography-phonetics converter in accordance ing Speech and playing that Speech back.