E. E. O. E.". O. E. Tonal Information for the Textual Entry Using the Parsed Seg 2002/0099547 A1* 7/2002 Chu Et Al
Total Page:16
File Type:pdf, Size:1020Kb
USOO7788098B2 (12) United States Patent (10) Patent No.: US 7,788,098 B2 Feng et al. (45) Date of Patent: Aug. 31, 2010 (54) PREDICTINGTONE PATTERN WO WOO3,065.349 8, 2003 INFORMATION FORTEXTUAL INFORMATION USED IN TELECOMMUNICATION SYSTEMS OTHER PUBLICATIONS (75) Inventors: Ding Feng, Beijing (CN); Yang Cao, Beijing (CN) Cao et al., “Decision Tree Based Mandarin Tone Model and its appli cation to speech Recognition” 2000, IEEE, p. 1759-1762.* (73) Assignee: Nokia Corporation, Espoo (FI) Shih et al., “Issues in Text-to-Speech Conversion for Mandarin'. Aug. c 1996, Computational Linguistics and Chinese Language Processing, (*) Notice: Subject to any disclaimer, the term of this vol. 1, No. 1, pp. 37-86.* patent is extended or adjusted under 35 U.S.C. 154(b) by 1466 days. Lee at al. “The synthesis Rules in a Chinese Text-to-Speech System'. Sep. 1989, IEEE transactions on Acoustics, Speech and Signal (21) Appl. No.:y x- - - 10/909,4629 Proceessing, vol. 37, No. 9, pp. 1309-1320.* Yang Cao, et al., “Decision Tree Based Mandarin Tone Model and Its (22) Filed: Aug. 2, 2004 Application to Speech Recognition.” National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, (65) Prior Publication Data pp. 1759-1762, (C) 2000 IEEE, Beijing, P. R. China. US 2006/OO25999 A1 Feb. 2, 2006 (Continued) (51) Int. Cl. Primary Examiner Richemond Dorvil GIOL I3/08 (2006.01) Assistant Examiner Olujimi A Adesanya GIOL I3/00 (2006.01) (74) Attorney, Agent, or Firm Foley & Lardner LLP (52) U.S. Cl. ....................................... 704/260; 704/258 (58) Field of Classification Search ................. 704/258, (57) ABSTRACT 704/260, 267 See application file for complete search history. (56) References Cited The techniques described include generating tonal informa tion from a textual entry and, further, applying this tonal U.S. PATENT DOCUMENTS information to PINYIN sequences using decision trees. For 5,652,828. A * 7/1997 Silverman ................... TO4,260 example, a method of predicting tone pattern information for 5,905,972 A * 5/1999 Huang et al. ................ TO4,268 textual information used in telecommunication systems 6,516,298 B1 2/2003 Kamai et al. ................ TO4,260 includes parsing a textual entry into segments and identifying E. E. O. E.". O. E. tonal information for the textual entry using the parsed seg 2002/0099547 A1* 7/2002 Chu et al. ................... TO4,260 ments. The tonal information can be generated with a deci 2002/0152067 A1* 10, 2002 Viikki et al. ................ TO4,231 sion tree. The method can also be implemented in a distrib 2004/0006458 A1 1/2004 Fux et al. ....................... TO4/8 uted system where the conversion is done at a back-end server and the information is sent to a communication device after a FOREIGN PATENT DOCUMENTS request. EP 10854O1 A1 * 3, 2001 EP O 833 304 B1 3, 2003 13 Claims, 3 Drawing Sheets FOHz) US 7,788,098 B2 Page 2 OTHER PUBLICATIONS Systems Laboratory, pp. 135-140, (C) 2003 IEEE, Tampere, Finland. Xia Wang, et al., “An Embedded Multilingual Speech Recognition Pui-Fung Wong, et al., “Decision Tree Based Tone Modeling for System for Mandarin, Cantonese and English.” Nokia Research Cen Chinese Speech Recognition.” Hong Kong University of Science and ter, Audio-Visual Systems Laboratory, pp. 758-764, (C) 2003 IEEE, Technology, pp. I-905-I-908, (C) 2004 IEEE, Hong Kong. Beijing, P. R. China. Janne Suontausta et al., “Low Memory Decision Tree Method for Text-to Phoneme Mapping.” Nokia Research Center, Audio-Visual * cited by examiner U.S. Patent Aug. 31, 2010 Sheet 1 of 3 US 7,788,098 B2 N4 Y O e 2 H Z so s U.S. Patent Aug. 31, 2010 Sheet 2 of 3 US 7,788,098 B2 99 09 ZZ 9."SO|- U.S. Patent Aug. 31, 2010 Sheet 3 of 3 US 7,788,098 B2 US 7,788,098 B2 1. 2 PREDICTING TONE PATTERN Suggested. As indicated above, significant meanings can be INFORMATION FORTEXTUAL lost without tonal information. INFORMATION USED IN International patent application WO 3065349 discloses TELECOMMUNICATION SYSTEMS adding tonal information to text-to-speech generation to improve understandability of the speech. The technique BACKGROUND OF THE INVENTION described by this patent application utilizes an analysis of the context of the sentence. Tone is identified based on the con 1. Field of the Invention text of other in which the word is located. However, such The present invention relates generally to speech recogni context is not always available, particularly with communi tion and text-to-speech (TTS) synthesis technology in tele 10 cation systems such as mobile phones, nor does context communication systems. More particularly, the present always provide the clues needed to generate tonal informa invention relates to predicting tone pattern information for tion. textual information used in telecommunication systems. Thus, there is a need to predict tone patterns for a sequence 2. Description of the Related Art of syllables without depending on the context. Further, there This section is intended to provide a background or context 15 is a need to predict tone patterns to properly identify names to the invention that is recited in the claims. The description used as contacts for a mobile device. Even further, there is a herein may include concepts that could be pursued, but are not need to synthesize contact names in communication termi necessarily ones that have been previously conceived or pur nals when tone information is not available. Still further, there sued. Therefore, unless otherwise indicated herein, what is is a need to generate tonal information from text for languages described in this section is not prior art to the claims in this like Chinese where tonal information is vital for communi application and is not admitted to be prior art by inclusion in cation and comprehension. this section. SUMMARY OF THE INVENTION Voice can be used for input and output with mobile com munication terminals. For example, speech recognition and 25 In general, the invention relates to generating tonal infor text-to-speech (TTS) synthesis technology utilize voice for mation from a textual entry and, further, applying this tonal input and output with mobile terminals. Such technologies information to PINYIN sequences using decision trees. At are particularly useful for disabled persons or when the least one exemplary embodiment relates to a method of pre mobile terminal user cannot easily use his or her hands. These dicting tone pattern information for textual information used technologies can also give Vocal feedback Such that the user 30 in computer systems. The method includes parsing a textual does not have to look at the device. entry into segments and identifying tonal information for the Tone is crucial for Chinese (e.g., Mandarin, Cantonese, and textual entry using the parsed segments. The tonal informa other dialects) and other languages. Tone is mainly charac tion can be generated with a decision tree. The method can terized by the shape of its fundamental frequency (FO) con also be implemented in a distributed system where the con tour. For example, as illustrated in FIG. 1, Mandarin tones 1, 35 version is done at a back-end server and the information is 2, 3, and 4 can be described as: high level, high-rising, low sent to a communication device after a request. dipping and high-falling, respectively. The neutral tone (tone Another exemplary embodiment relates to a device that 0) does not has specific F0 contour, and is highly dependent predicts tone pattern information for textual information on the preceding tone and usually perceived to be temporally based on the textual information and not the context of the short. 40 textual information. The device includes a processing module Text-to-speech in tonal languages like Chinese are chal and a memory. The processing module executes programmed lenging because usually there is no tonal information avail instructions and the memory contains programmed instruc able in the textual representation. Still, tonal information is tions to parse a textual entry into segments and identify tonal crucial for understanding. Tone combinations of neighboring information for the textual entry using the parsed segments. syllables can form certain tone patterns. Further, tone can 45 Another exemplary embodiment relates to a system that significantly affect speech perception. For example, tone predicts tone pattern information for textual information information is crucial to Chinese speech output. In English, based on the textual information and not the context of the an incorrect inflection of a sentence can render the sentence textual information. The system includes a terminal equip difficult to understand. In Chinese, an incorrect intonation of ment device having one or more textual entries stored thereon a single word can completely change its meaning. 50 and a processing module that parses textual entries into seg In many cases, tone information of syllables is not avail ments and identifies tonal information for the textual entries able. For example, Chinese phone users can have names in a using the parsed segments. phone directory ("contact names') in PINYIN format. Another exemplary embodiment relates to a computer pro PINYIN is a system for transliterating Chinese ideograms gram product having computer code that parses a textual into the Roman alphabet, officially adopted by the People's 55 entry into segments and identifies tonal information for the Republic of China in 1979. The PINYIN format used for the textual entry using the parsed segments. contact name may not include tonal information. It can be impossible to get tone information directly from the contact BRIEF DESCRIPTION OF DRAWINGS name itself. Without tone or with the incorrect tone, generated speech from text is in poor quality and can completely change 60 FIG.