USOO7319958B2

(12) United States Patent (10) Patent No.: US 7,319,958 B2 Melnar et al. (45) Date of Patent: Jan. 15, 2008

(54) POLYPHONE NETWORK METHOD AND Crothers, John. 1978. “Tipology and Universals of Systems', APPARATUS Universals of Human , vol. 2, Phonology. Joseph H. Greenburg, Charles A. Ferguson, Edith A. Moravesik, editors, (75) Inventors: Lynette Melnar, Austin, TX (US); Jim Stanford: Stanford University Press (pp. 93-152). Talley, Austin, TX (US); Yuan-Jun Fikkert, Paula. 1995. “Acquisition of Phonology”. Glot Interna Wei, Hoffman Estates, IL (US); Chen tional, 1:8. 3-8 (25 pages). Liu, Lisle, IL (US) Gamkrelidz, Thomas V. 1978. "Stops and ', Universals of Human Language, Vol. 2, Phonology. Joseph H. Greenburg, Charles (73) Assignee: Motorola, Inc., Schaumburg, IL (US) A. Ferguson, Edith A. Moraveski, editors, Stanford: Stanford Uni versity Press (pp. 9-46). (*) Notice: Subject to any disclaimer, the term of this Hieronymus, J. L. 1994. "ASCII Phonetic Symbols for the World's patent is extended or adjusted under 35 . Worldbef, Technical Report, AT&T Bell Labs (48 U.S.C. 154(b) by 755 days. pages). Hieronymus, J. L. 1997. “Worldbet Phonetic Symbols for (21) Appl. No.: 10/365,820 Multilanguage Speech Recognition and Synthesis”, Technical Report, AT&T Bell Labs (32 pages). (22) Filed: Feb. 13, 2003 Hockett, Charles F. 1966. “The Problem of Universals in Lan guage', Universals of Language, 2ed. Cambridge, MA: MIT Press (65) Prior Publication Data (pp. 1-29). US 2004/0176O78 A1 Sep. 9, 2004 (Continued) (51) Int. Cl. Primary Examiner Abul K. Azad GIOL I5/04 (2006.01) (52) U.S. Cl...... 704/254 (57) ABSTRACT (58) Field of Classification Search ...... 704/251, 704/254, 277 See application file for complete search history. Acoustic phones (preferably drawn 12 from a plurality of spoken languages) are provided 11. A hierarchically-orga (56) References Cited nized polyphone network (20) organizes views of these phones of varying resolution and phone categorization as a U.S. PATENT DOCUMENTS function, at least in part, of phonetic similarity (14) and at 6,085,160 A * 7/2000 Dhoore et al...... TO4/256.2 least one language-independent phonological factor (15). In 6,912,499 B1* 6/2005 Sabourin et al...... TO4,243 a preferred approach, a unique transcription system serves to OTHER PUBLICATIONS represent the phones using only standard, printable ASCII characters, none of which comprises a special character Cheng, Y.M., Chen Liu, Yuan-Jun Wei, Lynette Melnar, ChangXue (such as those characters that have a command significance Ma. 2003. "An Approach to Multilingual Acoustic Modeling for for common script interpreters such as the UNIX command Portable Devices”, Eurospeech, Geneva (pp. 3121-3124). Chomsky, ., and Halle, M. 1968. “The Sound Pattern of English', line). Harper & Row, Publishers New York, Evanston, and London, 4-6, (pp. 298-329). 25 Claims, 1 Drawing Sheet

------112 SELECT PLURALITY OF SPOKEN LANGUAGES f

PROVIDE SOME PHONES

5

LANGUAGE PROVIDE PHONETIC POLYPHONE INDEPENDENT SIMARITY PHONOLOGICAL NETWORK FACTOR US 7,319,958 B2 Page 2

OTHER PUBLICATIONS Ophala, John and Carrie Lang. 1996. “Temporal Cues for and Universals of Vowel Inventories”, ICSLP, Philadelphia (4 IPA. 1999. “Handbook of the International Phonetic Association'. pages). Cambridge: Cambridge University Press, 27-38. Rodman, M., B. Petek, T. Brandsted. 2002. “SPE-Based Selection Jakobson, R. 1941/68. Child Language, Aphasia and Phonological of Context-Dependent Units for Speech Recognition'. LREC 2002 Universals. The Hague & Paris: Mouton, 68-81, 87-91. WS on Portability Issues in Human Language Technologies, (pp. Kaiser, Ed. 1997. “IPA, Worldbet, and OGilbef, Course Manual : 78-84). Spoken Language Systems Laboratory, Center for Spoken Lan Stuker, S., F. Metze, T. Schultz, A. Waibel. 2003. "Integrating guage Understanding. Department of Computer Science & Engi Multilingual Articulatory Features into Speech Recognition', neering, Oregon Graduate Institute of Science & Technology (1 Eurospeech, Geneva (pp. 1033-1036). page). Schultz, T., A. Waibel. 2000. "Polyphone Decision Tree Special Maddieson, I. 1984. "Patterns of Sounds”. Cambridge Studies in ization for Language Adaptation'. ICASSP, Istanbul (4 pages). Speech Science and Communication, Cambridge: Cambridge Uni Wells, John. 1995. “Computer-coding the IPA: a proposed extension versity Press, 5-24. of SAMPA”, London: and Linguistics Dept., UCL (18 Melnar, Lynette and Jim Talley. 2003. “Phone Merger Specification pages). for Multilingual ASR. The Motorola Polyphone Network”, ICPhS. Barcelona (pp. 1337-1340). * cited by examiner U.S. Patent Jan. 15, 2008 US 7,319,958 B2

------12 SELECT PURALITY OF SPOKEN - LANGUAGES ---

PROVIDE SOME PHONES Af7G. 7

5

LANGUAGE PHONETIC PROVIDE INDEPENDENT

POLYPHONE

SIWLARITY PHONOLOGICAL NETWORK FACTOR

21- 22- 23 | PHONE 1 EPHONE| " 2: "PHONE 1:"phon 2 PHONE 3 PHONE 4 PHONE 5 pot s EPHONE 6: ! a 2-, 20

PHONE 87 }rrot :-PHONE 5* ...).; PhONE X FIG. 2 PHONE 9 PHONEv 10v

PHONE N : 10.

R ------

MEMORY YODELS

35 Af7G 3 SPEECH RECOGNITION ENGINE US 7,319,958 B2 1. 2 POLYPHONE NETWORK MIETHOD AND tially all defined phonetic contrasts in a given set of targeted APPARATUS languages or dialects, such models also poorly Support any attempt to exploit any cross-language allophonetic coinci TECHNICAL FIELD dences that might also exist; in addition, the set of models 5 is often too large and too finely differentiated for cost This invention relates generally to acoustic phones and effective and efficient multilingual or multi-dialect auto more particularly to networks of multiple phones. matic speech recognition needs). For example, one approach employs an acoustic feature-based data-driven method to BACKGROUND achieve a kind of phone merger. Acoustic models from a 10 collection of monolingual speech recognition systems are Speech recognition techniques are known in the art. Many compared and acoustically similar models are merged. Such Such techniques utilize models that are selected as a function an approach indeed tends to at least reduce to some extent of a polyphone network. For example, a network of phones total memory requirements, but such an approach is also (wherein a “phone' is generally understood to typically comprise a speech Sound considered as a physical event relatively indiscriminate with respect to the grouping of without regard to its possible phonological status in the 15 acoustically similar phones (for example, this approach Sound system of a language) that comprise at least the tends to readily permit the merger of acoustically similar but primary phonemes (wherein a "phoneme' is generally phonologically contrastive models). This approach also fails understood to comprise an abstract unit of the phonological in large measure to Supply acoustic models of phones for system of a language that corresponds to a set of similar which little or no data is conveniently available. One sug speech Sounds that are perceived to be a single distinctive gested alteration to this latter approach has been to constrain Sound in the language) of a given spoken language (or the acoustic data-driven method as a function of language specific dialect thereof can comprise an acoustic foundation knowledge. This approach seeks to retain improved memory upon which one derives a set of models that a corresponding requirements while also attempting to address acoustic con speech recognition engine can utilize to effect recognition of fusability of distinct phones. Unfortunately, however, this a sample of uttered speech. 25 attempt at improvement still remains largely dependent upon Speech recognition platforms (such as, for example, a the availability and quality of language data. cellular telephone) that Support recognition of multiple Another impediment to fielding a commercially useful languages are also known. Unfortunately, current multilin result is the present practice of tending to represent phone gual (and/or multi-dialect) automatic speech recognition information with a transcription system that favors unique technologies face a number of practical constraints that 30 appear to significantly limit widespread commercial appli font sets and/or unusual characters that are typically non cation of such an approach. One problem involves the cost alphanumeric, printable characters that are commonly used of acquiring and/or the relative availability of relevant for special purposes in command line scripting as they have language resources and expertise. For example, while many special control interpretations (such as, for example, (a), hundreds of different acoustic phones are known to be >, \, 'I', '=', '{, , , *, '(', and so forth) in one or utilized globally in one language or another, any given 35 more relevant computer command and control protocols. language (or dialect) tends to use as phonemes only a For example, some characters used by Some transcription relatively small subset of this larger pool. The limited systems present phone information that is also interpreted by availability of resources and technology-savvy expert Unix command line interpreters as specific Unix commands. knowledge required to linguistically characterize a language This unfortunate proclivity further complicates the matter of for speech recognition engineering purposes (including, but 40 attempting to provide a flexible, efficient polylingual speech not limited to, specifically identifying those phonemes that recognition method and apparatus. appropriately characterize a given language) are an impedi ment to broad language coverage. Further, the time and BRIEF DESCRIPTION OF THE DRAWINGS expense of creating, finding, or otherwise acquiring appro priate acoustic speech data (with associated transcriptions, 45 lexica, and so forth) of acceptable quality and quantity to The above needs are at least partially met through provi permit training of speech models often make Such endeavors sion of the polyphone network method and apparatus commercially unfeasible, especially for consumer popula described in the following detailed description, particularly tions that represent a relatively small speaker group and/or when studied in conjunction with the drawings, wherein: an emerging market. 50 FIG. 1 comprises a flow diagram as configured in accor Computational resource limitations present another prob dance with an embodiment of the invention; lem. A not atypical prior art approach combines a plurality FIG. 2 comprises a schematic representation of a hierar of monolingual speech recognition systems to consolidate chical phone correspondence network as configured in Sufficient capability to Support multiple languages and/or dialects. With Such an approach, however, requirements for accordance with an embodiment of the invention; and both the necessary language resources and computational 55 FIG. 3 comprises a block diagram of a speech recognition resources increase Substantially proportionally with each platform as configured in accordance with an embodiment of incremental Supported language and/or dialect. The costs the invention. and/or form factor constraints associated with Such needs Skilled artisans will appreciate that elements in the figures can again influence designers away from including lan are illustrated for simplicity and clarity and have not nec guages that correspond to Smaller speaker populations and/ 60 essarily been drawn to scale. For example, the dimensions of or Smaller present marketing opportunities. Some of the elements in the figures may be exaggerated To attempt to meet such problems, some effort has been relative to other elements to help to improve understanding made to consider sharing acoustic models across a plurality of various embodiments of the present invention. Also, of languages and/or dialects. Such an approach typically common but well-understood elements that are useful or requires alteration to the fundamental approach by which 65 necessary in a commercially feasible embodiment are typi models are developed (for example, while models that result cally not depicted in order to facilitate a less obstructed view from the approaches noted earlier tend to preserve Substan of these various embodiments of the present invention. US 7,319,958 B2 3 4 DETAILED DESCRIPTION rality of spoken languages. If desired, of course, one may optionally first begin by selecting 12 a plurality of specifi Generally speaking, pursuant to these various embodi cally selected spoken languages. If desired, the selected ments, phonological similarities across a plurality of lan languages can constitute a significant portion of all currently guages (and/or dialects) are leveraged in a way that is at least known and spoken languages (such as, for example, at least partially based upon language-independent factors. More five or even twenty-five percent of all such candidate particularly, a plurality of phones that correspond to a languages). Regardless, however, it is preferred that the plurality of spoken languages can serve to provide a poly provided phones be drawn from a plurality of languages phone network as a function, at least in part, of at least one wherein at least two, and preferably more, of the languages of phonetic similarity as may exist between the phones and 10 are substantially typologically diverse with respect to one at least one language-independent phonological factor. In a preferred embodiment, the plurality of phones correspond to another (in general, significant diversity as between utilized a plurality of languages including at least some languages languages will tend to better facilitate the provision of a rich that are Substantially typologically diverse with respect to and enabling polyphone network than mere quantity of one another. Such diversity enriches the resultant polyphone typologically similar languages). Also in a preferred network and tends to render such a network more likely to 15 approach, when providing 11 the phones, at least Some of the provide satisfactory models even for languages that are not phones should correspond to phonemes that are themselves otherwise specifically represented. Pursuant to another pre Substantially unique to each of at least some of the plurality ferred approach, at least some of the plurality of phones are of spoken languages (and where preferably all such unique based, at least in part, upon at least some phonemes that are phonemes are so represented by Such phones). Again, cap themselves substantially unique to only a few (or one) of the turing Such instances of uniqueness will tend in the aggre represented languages (in one embodiment, all phonemes gate to enrich the acoustic qualities of the resultant poly that are unique to a particular language amongst the plurality phone network. of languages are so represented). Various transcription systems exist by which to express In a preferred embodiment, the polyphone network com such phones. It is therefore possible to provide 11 the phones prises a hierarchical phone correspondence network that 25 as described above using any of a wide variety of known or preferably organizes merged phone selections in a tree. In hereafter developed transcription systems. As noted earlier, one embodiment, the tree can comprise at least one binary however, most existing transcription systems tend to require branching tree. Depending upon the embodiment, phone either unique fonts and/or include characters that have information as contained by the polyphone network can be additional instructional meaning for many processing plat viewed at varying levels of resolution (such that, for 30 forms. In a preferred approach, therefore, the phones are example, higher resolution information may include at least represented using a transcription system that comprises only some phone information that comprises phonemes for at standard, printable ASCII characters and wherein none of least some of the plurality of spoken languages and lower the characters utilized comprise a special character having a resolution information may include phone information that corresponding computer operation command meaning with is representative, at least in part, of cross-linguistic and respect to the desired operating environment (for example, language-independent mutual phonetic similarity and non 35 when using Unix to implement the relevant platform(s), contrastiveness of corresponding higher resolution phone none of the transcription characters should have a command information views). meaning to a Unix command line interpreters). A description Depending upon the embodiment, the language-indepen of a new preferred transcription system that accords with dent phonological factor can comprise information Such as these design criteria follows. language-independent phonological contrast, cross-linguis 40 This transcription system is an ASCII-encoded speech tic phone frequency, predetermined linguistic tendencies, symbol inventory restricted to lower-case alphabetic char and predetermined linguistic universals. acters, numerals, and the non-alphanumeric symbol . In general, the resultant polyphone network is preferably Each individual symbol is associated with a specific pho Substantially unbiased towards any particular language, lan netic feature (or feature constellation) dependent on the type guage family, and/or language type. 45 of phone represented ( or vowel) and its syntactic In a preferred embodiment, the phones are represented in position in the phone symbol string. In this embodiment, the the polyphonic network using a transcription system that basic character length of any consonant or vowel phone comprises only standard ASCII characters (none of which representation is two; this obligatory symbol pair is referred comprises a control character or other special character to as a phone's base sign. Pursuant to a preferred embodi (hereinafter referred to generally as “special characters') 50 ment, a phone's phonetic features are directly encoded into having a corresponding computer operation command the label. The first symbol of a base sign unambiguously meaning nor to which Unix command line interpreters marks the phone as either a consonant or vowel. All non ascribe special significance) Such that the transcription sys tonal diacritic symbols are sorted alphabetically behind the tem characters only require a standard font set and Substan base sign and are framed by the marker . Tonal diacritics tially avoid programming or Scripting collisions. are suffixed to the right diacritic marker of vowel phone So configured, a polyphonic network can be provided that 55 strings. An example of a simple consonant phone represen will tend to well Support polylingual purposes, including at tation, one with only the two obligatory base-sign positions least Substantial accommodation of languages that are not filled, is kp, where the first position k is associated with specifically represented by the network. The resultant net the consonant class of phones and indexes velar articula work (and speech recognition models based thereon) can tion and the consonant second position p' indexes also be well accommodated by relatively modest computa 60 less . Thus, kp is a (/k/ in tional resources including reduced memory needs as com the International Phonetic Alphabet “IPA). A complex pared to prior art counterparts. Vowel symbol String, that is, a Vowel representation that Referring now to the drawings, and in particular to FIG. includes diacritics Suffixed to the base sign, is exemplified 1, in a preferred approach, the process begins with provision by iij 4, where vowel first position i indexes 11 of a plurality of phones. In some cases, these phones may 65 unrounded front, vowel second position i signifies be at least partially provided through use of one or more close, vowel third position' is palatal offglide, and forth previously compiled/selected phones as drawn from a plu position “4” is low tone (/i/ in the IPA). The consonant and US 7,319,958 B2 5 6 Vowel position classes are defined below, along with an inventory of the position class constituents. TABLE 2 A preferred consonant phone string has the following position structure, from left to right: Vowel Symbol Inventory by Position Class 1 position (obligatory): Symbol representing primary 5 1st Position 2" Position 3 Position place(s) of articulation; (back, +f-r) (openness) (diacritics) 4" Position (tones) front-r i close i primary stress 1 extra-high 1 2" position (obligatory): symbol representing primary front-r y close, lax h secondary 2 high 2 manner(s) of articulation and Voice (+/-V); and StreSS 10 central e close-mid e tertiary stress 3 mid 3 3" position: sorted diacritic symbols, framed left and - right by the diacritic marker central o mid X breathy voiced c OW 4 -- The consonant phone symbols and their interpretations back-r a open-mid O central offglide e extra-low 5 are presented by position class in Table 1. back-r u open2-mid c laryngealized h rising 32 15 open a palatal offglide alling 34 Within a given consonant symbol string, the first and Schwar r velar stricture k high-rising 21 second positions are only occupied by one symbol each. The (wife) first position symbol of a consonant phone label comprises long high- 24 alling a consonant symbol in traditional Romanized orthography labial inglide m low-rising 42 (excluding y' as noted below). This convention unambigu nasalized l ow-falling 45 ously marks the label as a consonant phone. devoiced o rising- 424 alling The third position is not limited in length, but the diacritic retracted r set will preferably be sorted. Note that all symbols in the retroflexed consonant inventory are lower-case alphabetic characters overshort S coronal inglide t and that any particular character may be reused with a 25 labial offglide w different meaning in a different syntactic position in the pharyngealized X phone string. Thus, a first position p' means bilabial while palatal inglide y a second position p' means voiceless plosive. The sign pp., therefore, represents a voiceless bilabial plosive. Sec A vowel phone label preferably begins with either a ondary places and manners of articulation are given in the 30 traditional vowel symbol from Romanized orthography or third position diacritic class. As an example, an aspirated the character y. Thus, any label beginning with a, e, i. voiceless bilabial plosive is pp c . 'o', 'u', or y unambiguously marks a vowel phone. Like the consonant position-class inventories, within a TABLE 1. given phone symbol String each constituent of the first and 35 second obligatory positions are in a disjunctive relation to Consonant Symbol Inventory by Position Class other constituents in their same class. Furthermore, each 2". Position (MOA, constituent of these positions is a lower-case alphabetic Position (POA) +f-v) 3" Position (diacritics) character and any particular character may be reused in a different position where it is associated with a distinct Bilabial plosivef-v p aspirated/breathy C 40 meaning. Thus, an unrounded open is repre abio-dental plosive?+v b lateralized d abio-palatal m nasal (+V) in laryngealized h sented by the string 'aa, where 'a' indexes back unrounded abio-velar b fricativef-v S palatalized j in Vowel first position and open in Vowel second position. Dental d ?+v Z velarized k Non-tonal diacritics are sorted and framed by the diacritic Alveolar lateral fricativef-v long marker and all tonal diacritics are represented by num alveo-palatal g lateral fricative?+v y prenasalized l 45 bers and follow the right diacritic marker. So, for example, alveo-lateral approximate (+V) w nasal (ized/release) in a long unrounded open-mid associated with high Retroflex r lateral-approx. (+V) I devoiced O tone would be symbolized as 'io 1 1. Palatal affricatef-v f labialized p So configured, this transcription system is designed to be Wellar & affricatef-v v imploded C Ovular C tap? flap (+v) d syllabic S usable from both a processing and a human-factors perspec Pharyngeal X nasal tap, flap (+V) m ejected t 50 tive. Because the symbol string is structurally defined and Epiglottal C trill (+v) r non- w delimited, and because its symbol inventory lacks poten Glottal l pre-aspirated W tially ambiguous signs in a programming environment, these pharyngealized X transcription strings are easy to parse and manipulate. not released Z. Because individual symbols are consistently used within their type (consonant or vowel) and class (position number) 55 and were chosen to be as congruent as possible with symbols A preferred vowel phone string has the following position from pre-existing phonetic symbol inventories, and because structure, from left to right: string syntax is well defined, this transcription system is 1 position (obligatory): Symbol representing backness/ relatively easily learned and expanded (unlike, for example, the IPA system). roundness (back, +/-r); 60 As noted above, each symbol string preferably consists of 2" position (obligatory): symbol representing openness; a two character base sign to which diacritic markers may be 3' position: sorted diacritic symbols, framed left and suffixed. Suffixes only terminate in either the right-frame right by the diacritic marker ; and diacritic marker or a right frame diacritic marker fol lowed by one or more numerals. In this embodiment, the first 4" position: number representing tone. 65 position character of a base sign always marks the string as The preferred constituents of the vowel position classes either consonant or vowel, i.e., there is no overlap of are presented in Table 2. characters between the first position consonant and vowel US 7,319,958 B2 7 8 symbol inventories. Finally, there is no use of symbols that highest resolution view 21 in turn comprises phones that are employed in a special way in programming. Thus, every represent a merger of two or more lower level phones (to put symbol-string label indexing a distinct phone can be easily this another way, this lower resolution view provides a lower identified and parsed with minimal programming effort, and resolution view of phone information comprised of phone it is not generally necessary (as it is with many other 5 categories). To illustrate, PHONE 1, PHONE 2, and PHONE phonetic alphabets) to build a special-purpose scanner 3 from the first highest resolution level 21 have been merged designed to convert these labels to other label types (num into PHONE 1 at the next level of resolution 22 while bers, for example). Each individual character used in the above described PHONE5 and PHONE 6 have been merged into PHONE 5'. preferred transcription system comprises a sign, or form In general, the resultant lower resolution phone will com meaning pair. The relationship between a character's 10 prise a selected one of the merged higher resolution phones. graphic form and its phonetic meaning is one-to-many and Details regarding merging criteria are provided below. It uniquely dependent on the sign’s context, that is, its place in should also be noted that not all of the higher resolution the symbol-string label. For example, the form w is asso phones are necessarily merged with another phone at this ciated with three distinct articulation features or feature second level 22 in the hierarchy. Instead, to illustrate, constellations: 1) voicing and approximation, 2) pre-aspira 15 PHONE 4 and PHONE 9 both remain unmerged at this tion, and 3) labial offgliding. Notwithstanding the above, the second hierarchical level 22. meaning of any particular w is never ambiguous: w is In a similar fashion, the resultant phone categories (and associated only with meaning (1) as a second position remaining unmerged phones of the first hierarchical level consonant sign; it indexes meaning (2) as a third position 21) are again merged as and how appropriate at a next consonant diacritic; and as a third position vowel diacritic, yet-lower-resolution hierarchical level 23. For example, it is uniquely linked to meaning (3). PHONE 1 and PHONE 4 are merged at this level to yield Most alphabetic characters are strongly associated with PHONE 1°. In a preferred embodiment, fifteen such levels phonetic meanings that users literate in Romanized orthog are provided (with the intervening levels being indicated by raphies learned during childhood. Because of this, there is an the ellipses 24 depicted), with the final lowest resolution attempt in most multilingual phonetic inventories to con 25 form as much as possible to pre-existing form-meaning view 25 preferably comprising a final category PHONE X'. associations, and the above described preferred transcription Conceptually, this final hierarchical level 25 comprises a system mirrors this viewpoint. Thus the symbol i is typi single Sound that purports to grossly characterize the entire cally associated with the vowel features close, front and polyphone network, unrounded and this proposed transcription system observes So configured, these various views provide phone catego this conventional sense as well. Only these features are 30 ries that correspond to views of varying resolution wherein divided between two classes: vowel position 1 (where 'i' the phone information essentially represents, at least in part, means front and unrounded) and vowel position 2 (where cross-linguistic and language-independent mutual phonetic it means close). By unpacking a character forms typically similarities and non-contrastiveness of corresponding higher associated feature set, it becomes possible to use the form in resolution phone information views. As the degree of reso several distinct signs, each associated with just one or two 35 lution drops by moving higher in the tree, at least Some of features of the Source sign. the merged phone nodes each comprise a merger of a There are presently several pre-existing phonetic alpha plurality of phone categories, which phone categories are bets in common use within the speech research community. themselves each most representative of a given plurality of XSAMPA and Worldbet, for example, are relatively widely phone categories. employed for multilingual applications (though again, these 40 For many tasks, of course, it is not feasible to use a narrow systems tend to Suffer drawbacks associated with computer phone transcription as represented at the highest resolution processing applications and conflicts). Appendix A attached levels of the Suggested polyphone network 20. All languages hereto presents an abbreviated base-sign inventory for the collapse distinct phones that are non-distinct phonologically preferred transcription system (denoted as “new” in the into phonemes. Using the preferred transcription system appendix) described above and further maps those contents 45 detailed above, consider that in English, for example, a to the XSAMPA, Worldbet, and the IPA phonetic alphabets flapped (td), a voiceless alveolar stop (tp) and for the convenience of the reader. a voiceless aspirated alveolar stop (tp c ) are all allophones With continued reference to FIG. 1, the provided phones of the phoneme represented by the English alphabetic sym 11 are then used to provide 13 a polyphone network as a bolt. Likewise, it is possible for distinct phonemes across function, at least in part, of phonetic similarity 14 as may 50 languages (though not distinct phonemes internal to a lan exist between certain of the provided phones and at least one guage) to be collapsed into a merged Superphoneme clas language-independent phonological factor 15 (such as, for sification system. Thus, for example, it is usually the case example, at least one of language-independent phonological that a voiceless phoneme (dp) and a voiceless contrast, cross-linguistic phone frequency, predetermined alveolar stop phoneme (tp) may share acoustic models linguistic tendencies, and predetermined linguistic univer without losing crucial cross-linguistic phonological distinc sals). So configured, the resultant polyphone network will 55 tions. Such knowledge can be exploited when reducing a tend to be substantially unbiased towards any particular multilingual model inventory to a size that is more compat language, language family, and/or language type. ible with recognition systems in embedded devices. In a preferred approach, the polyphone network includes The polyphone network proposed herein comprises a and/or comprises a hierarchical phone correspondence net phone merger specification tool for shared multilingual and work. In one embodiment, that hierarchy can be configured 60 multi-dialect acoustic modeling that takes advantage of as a plurality of merged phone selections organized in at phonetic similarities (as inherently and conveniently defined least one tree (such as, for example, a binary branching tree by the preferred transcription system described above) while hierarchical arrangement). For example, and referring now maximally preserving known phonological contrasts. In a to FIG. 2, all of the phones (such as, for example, PHONE preferred embodiment, the polyphone network comprises a 1 and PHONE 2) that comprise the network can share a 65 binary branching tree that has as its leaves a nearly exhaus common highest resolution view 21 or level within the tive inventory of core phones derived from a rich set of polyphone network 20. A next level or view 22 up from this typologically diverse languages (where preferably at least all US 7,319,958 B2 10 major language families are represented) (and wherein core where a caret () indexes a node in the tree and the numbers phone should be understood to refer to an allophone that is signify a hierarchical level, beginning with 0 at the leaves. most representative of a particular phoneme, defined cross Therefore, tp 14 is the parent node of the children nodes linguistically). Lower resolution nodes on the tree represent Superphonetic categories with merger being preferably tp 13 and kp11, and tp. 13 has the children nodes tp 12 and based entirely on phonetic similarity, core phone frequency, pp. 10. and language-independent (universal) phonological laws At each node in the tree a confidence score between 1 and tendencies. and 10 appears in square brackets, where 10 indicates As an illustration of how language-independent factors very high confidence in the merge and 1 indicates very low can contribute to such a hierarchical arrangement of phone 10 confidence in the merge. In the above examples all nodes classes, a detailed example is provided below: have a confidence score of 2. These scores are preferably Consider the following set of language universals, ten derived from three knowledge sources: 1) relative phonetic dencies, and frequency observations: similarity (defined automatically by the transcription struc Language universal (1): all languages have stop conso ture), 2) relative phonological contrastiveness (that is, how 15 often do the phones included in the children nodes contrast nants, phonologically), and 3) frequency (that is, how frequently Language tendency (1): languages tend to have a plain the phones included in the children nodes occur cross Voiceless stop consonant series; linguistically). Frequency observation (1): p., t and k are the most The above example illustrates how the preferred poly frequently occurring stop in the world’s inven phone network provides a novel, linguistically informed tories, with t being the most common and p the least framework within which the phone model inventory size and common of the three: search space issues inherent to embedded multilingual or Language universal (2): t

New Consonants: nasals PLACE: MANNERVOICE New PA XSAMPA WORLD bilabial : nasal (v) labio-dental : nasal (v) dental : nasal (+v) I alveolar: nasal (tw) alveo-palatal : nasal (v) retroflex: nasal (v) palatal : nasal (v) velar: nasal (+v) uvular: nasal (tw)

New Consonants: nasal taps/flaps PLACE: MANNER/VOICE New XSAMPA dental : nasal tap/flap (iv) dm alveolar : nasal tapfflap (tv) tim alveo-palatal : nasal tap/flap (iv) gm retroflex: nasal tap/flap (iv) US 7,319,958 B2 13 14

New Consonants: taps/flaps PLACE: MANNER/VOICE New IPA WORLD bilabial: tap/flap (+v) dental : tap/flap (+v) alveolar : tap/flap (iv) alveolar-lateral : tap/flap (+v) alveo-palatal : tap/flap (+v) retroflex: tap/flap (+v) uvular : tap/flap (iv)

New Consonants: trills PLACE: MANNER/VOICE IPA XSAMPA bilabial : trill (+v) B\ dental : trill (+v) alveolar : trill (v) alveo-palatal : trill (v) retroflex : trill (+v) uvular : trill (+v) RA

New Consonants: fricatives PLACE: MANNER/VOICE New IP A XSAMPA bilabial : fricative/-y bilabial : fricative?+v labio-dental : fricative/-v labio-dental : fricative?+v labio-velar : fricative/-v dental : fricative/-v /+v alveolar : fricative?-v S alveolar: fricative?+v alveo-palatal : fricative/-v alveo-palatal : fricative/+v retroflex : fricative/-v retroflex : fricative/+v palatal : fricative/-v palatal :fricative?+v Velar : fricative/-V velar : fricative/tv uvular : fricative/-v uvular : fricative/-v pharyngeal : fricative/-v pharyngeal : fricative?+v US 7,319,958 B2 15 16 epiglottal:fricative/-v CS H epiglottal: fricative?+v CZ. S. glottal : fricative/-v hs h glottal: fricative?+v hz f

New Consonants: lateral fricatives PLACE: MANNER/VOICE New IPA WORLD dental : lateral fricative/-v d s dental : lateral fricative/v dy k alveolar : lateral fricative/-v t t alveolar : lateral fricative/+v ty s alveo-palatal : lateral fricative/-v g - alveo-palatal : lateral fricative?+v gy -

New Consonants: PLACE: MANNER/VOICE New IPA WORLD bilabial: /-v pf pd bilabial: affricate/v pv bB labio-dental : affricate/-v fif pf labio-dental : affricate/-v fy bV dental : affricate/-v df S dental : affricate/v dv dZ alveolar : affricate/-v tf ts alveolar : affricate/hy ty dZ dZ alveo-palatal : affricatef-v gf t? tS alveo-palatal : affricate/+v gy d3 retroflex: affricate/-v f tS retroflex: affricate/+v v dZ palatal : affricate/-v jf Cg palatal: affricate/+v jv jj velar: affricate/-v kf kx velar: affricate?hv kv CY uvular: affricatef-v qf CX uvular: affricate/tv qV G4

New Consonant: approximates PLACE: MANNER/VOICE New IPA XSAMPA WORLD labio-dental : approx. (tv) fw P, wV V labio-palatal : approx. (iv) W labio-velar: approx. (+v) bW alveolar: approx. (tv) retroflex: approx. (v) US 7,319,958 B2 17 18 palatal : approx. (iv) jw j j j velar: approx. (v) kw Ul MN 4) lab-dent : lateral approx. (v) f - V dental : lateral approx. (tv) d alveolar : lateral approx. (v) t l l retroflex: lateral approx. (+v) r r palatal : lateral approx. (+v) jl A L L velar: lateral approx. (v) kl l L\ Lg

New Vowels BACKNESS/+-R: OPENNESS New IPA XSAMPA WORLD front/-r: close ii i i i front/r: close yi y y y front/-r: close, lax ih I front/r: close, lax yh Y Y Y front/-r: close-mid ie e e front/+r: close-mid ye 2 2 7 front/-r: open-nid io E E E front/+r : open-mid yo O2 9 8 front/- : open2-mid ic 3. { G) central: open Ca a. 2 al front/r: open ya OE & 6 central/-r: close ei i ix central/hr : close oi t X central/-r: close-mid ce s G\ central/hr : close-mid OC 8 8 OX central (Schwa) CX 9 (a) & Schwar c el 3r 3r central/-r: open-mid CO 3 3 (a): central/hr : open-mid OO G 3\ central open2-mid eC 2 6 ax backf-r: close ai L M 4 backf-hr : close ui U l backf-r: close, lax uh U U U backf-r: close-mid ac 7 2 back/hr : close-Inid C O O O backf-r: open-mid aO A V W back/I : open-mid O O O > back/-r: open aa. A A back/tr: open la o Q 5 US 7,319,958 B2

Appendix B Phonetic description 15 14 13 12 t O 9 8 7 6 bilabiatplosive-w bilabialplosive-wi-ret 10 pp3 bilabiatplosive-wipreasp pp w 10 pp.4 bitabialplosive-Vilab pPP (9) pp.5 bilabialplosive-vilong pp. 8 bilabialplosive-wipal pp. pp. 1 bilabialplosive -villaryn bilabiatplosive-wimpl pp. 1 4. bilabialplosive-wiejec pp. 6 bilabial plosive -wfaspipal ppcil pp. c.1 PPC 3 bilabialplosive -wfaspivelilab od ck O 7 pp. c 26) bilabial plosive -wfasp bilabial/plosive-wilong/asp 9C 10 pp.7 bilabial plosive +w bilabialplosive -rel b E. E. 3. bilabiatplosive +wilab pbna. pbas bilabialplosive vilong Elbpb B 8 pp.8 bilabialplosive wipal pbj.- pbé 5 bilabial plosive +villaryn p 8 pp.9 bilabialplosive +wiprenas bilabla plosive +wiprenasiab E.EPE E."B 9. - T " 2 bilablatplosive twinasal bilabialplosive twimp pbell pb 1 blabial plosive +wifejec pb pt2 bilabial plosive twiasp bilabialplosive +viaspipal PPC 8 " labio-velariplosive -w labio-velariplosive +w labio-velariplosive twiprenas bbonin 9it le 1(6) bilabia affricate-w bilabiafricative -w bilabialfricative-wilab pp.10 bilabiafricative-wiwelab pskg 7 (7) ps3 2 bilabialificative-wipal 7 psA US 7,319,958 B2 21 22 bilabial fricative-wlejec H (4) bilabial fricative tw 7 bilabial fricative wipal pzi 8 fiss labio-velarifricative aw 4. labio-dentaffricative -w fsn 1 labio-dental/ficative-willong 10 fs2 labio-dentallificative-wipal isi 9 Jabio-dentaiffricative -wlejec labio-detaffricative tw labio-dentallifricative twipal E. G. bilabialinasal-w bilabianasal-wab bilabial/nasalt-wiwelab pnkp. 9 10) bilabial/nasal+viasp L pn^4 bilabianasalt-willong 7 pn's pn, bilabial/nasaltvipal pri a" (7) (7) pna7 bilabial/nasal+viliaryn pr h 7 bilabial/nasal-w labio-dentanasaw f bilabiatrill-w tpa13 alveolariplosive-v tp tph 2 dentalplosive-w dp 10 tp2 alveolariplosive-wi-rel tp_2- 10 tp3 dentalplosive -w-re dpz 10 alveolariplosive-wipreasp pw alveolariplosive-viab 9. 8 alveolaliplosive-willong dental plosive-willong dp 10 6 alveolariplosive-wipal dentalplosive-wipal 10 alveolariplosive Wipharyn dental/plosive-wiphary 9. 10 tpx2 tpx 3 7 tpat dental plosive -wilongipharyn dox 9. 8 4. US 7,319,958 B2 23 24

alveolariplosive -vlayn alveolarlplosive-wimpl dental plosive-wimpl alveolalplosive-wiejec dentalplosive-wiejec alveolariplosive-wiaspipal alveolariplosive -waspfvel alveolariplosive-wiasp dentalplosive-wiasp alveolariplosive-willongiasp alveolariplosive +w dental/posive tw alveolariplosive +wf-rel dental plosive twi-rel alveolariplosive twilong dentalplosive twilong alveolariplosive twinasal dentalfplosive +vinasal alveolariplosive wiprenas dentalplosive twiprenas alveolariplosive twilongprenas alveolariplosive +wiprenas/lab alveolariplosive twpaprenas alveolaripiosive twipal dentalplosive wipal alveolariplosive wivel alveolariplosive +wipharyn dental/plosive twipharyn dental plosive vilong pharyn dental/plosive +villaryn alveolar/plosive +wllaryn alveolariplosive wrimp tb 1 dentalplosive +wimpl db alveolarlplosive tvlejec thq3 alveolariplosive vlasp tb c 1 (8. dentalplosive +wiasp O O retrofexplosive -w retroflexplosive-willong retroflexplosive -villaryn US 7,319,958 B2 25 26 retroflexiplosive-wlejec p 6 retroflexplosive-wasp c1 retroflexplosive-willorglasp is is rp4 retroflexiplosive --w 5 retroflexplosive +vilong 10 ba2 retroflexplosive +viprenas retroflexplosive +vinasai m 10 (8) retroflexplosive +wflaryn E. E." retroflexplosive +vilongilaryn rth 10 retroflexplosive viejec retroflexplosive twiasp rb tp:10 alweolariaffricaterw tf 2 dentaliaffricate -w f 10 tfa2 alveolarlafricate wilong alveolariafricate willong lab tip E9 to tfa3 alveolarlafricate-wipal tf tf A1 (7 tfa4 dentaliaffricate -wipallat dfj 9 tfi A2 8 alveolariafricate-wiwei tfk (8 alveolariaffricate -villaryn ; : " tfits alveolariaffricate-Vilat O (4) alveolariaffricate-Vilablejec tipt tfpt A1 alveolariaffricate-Vilatiablejec alveolarlafricate-wiejec tfit tfit A1 7 dentaliaffricate-wiejec d 10) alveolariafricate-wiaspilab tific tfas alveolariafricate -vilaspilat tfc tfc 3 4. alveolariafricate -wiasp tfc tf c 1 It c^2 8 dentallafricate-wiasp df c 10 8 tfa alveolariafricatew tw ty 5 dertaliaffricate-w dw 10 ty2 ty3 alveolarlafricate +wiprenas tyn (9) 8 ty4 alveolariaffricate-wiwei twk 7 wns alveolariaffricate wilat O 6 alveolariaffricate +wiejec tw. twc1 alveolariaffricate twiasp twic retroflexiafricate -w rf retroflexiaffricate-wiejec rf c 1 rat US 7,319,958 B2 27 28

retroflexiafricate-wasp rf Cl 5 2 (6) tpa11tp12 retroflexiafricate w y rya 2) (2 retroflexiaffricate +wiprenas W (9) alveolarfricative -w ts ds s2 tsa3 dentalificative-w alveolarifricative-wipreasp tsa4 alveolarificative-wilab 8 alveolarlfricative-willong dental/fricative willong alveolarifricative-wipal ts dental fricative-wipa. ds 10 ts alveolarifricative-wipharyn tsh A1 tsh 3 3. dental/fricative-willongipharynalveolarifricative -villaryn tsh - 8 ** tsh2(9) 7 tsha7 J dental/fricative-wipharyn alveolarlfricative-wiejec ts ts c 1 alveolarifricative-wasp tSc tsa7 dentallifricative-wi-sib 2 - alweclarifricative w ; ts dentatificative --w 5 alveolarifricative +w?ong E. E. : A1 8 dentallifricative twilong O alveolarificative +wiprenas alveolarifricative wipal tail tz 1 t4 dentallifricative twipal dzi 10 f tzas ts alveolarificative +wipharyn tax tz.2 3. 5 dentaiffricative +wipharyn O dental fricative-w-sit dentatifricative 4-wi-siblpharyn dzw 9 retroflexifricative -w S. sa retroflexifricative-wiejec rst 6 retroflexificative w 4. retroflexificative +villaryn O 7 US 7,319,958 B2 29 30

alveolarlateral fricative -w dentallateral fricative-wfiong di:- 8 t2 tj^3 dental/lateral fricative-wipal a a la (6) alveolarlatfricative -viejec ta. (5) alveolarlateral fricative +w ty tya1 dentallateral fricative +vipal dy 7 alveo-palata/lateral fricative-v alveo-palatallateral fricative +w gy alveolarlat approx+w dentallat approx *w alveolarlat approx tvlasp t c alveolarlat approx+vilong dentallat approx+vlong alveolarlat approx+vipal ti- ti 1 dentallat approx+vipal dij 10 tl2 tima alveolariat approxipal-w tjo 7 tlk 3 7 ta5 alveolariat approx+vivel tk 8) tlk 4 (6) alveolarlat approxvillongvel - tlk 2 8 alveolarlat approxivel-w alveolarlat approx+vipharyn tx tha alveolarlat approx+villaryn th 9 alveolallat approx-w dentallat approx-v retroflexflat approx+y retroflexflat approx-w E. G.9 alveolarltri +w dental/tri +w d 10 ta7 alveolaritrili +vlong 8 (2) alveolaritrill+wipal trj tr. 1 tr3 dental/trill--wipal alveolaritrl+vive alveolaritrili +villaryn trih 9. tra4 tra5 alveolaritri-w to (6) (5) alveo-palatal/trill +w US 7,319,958 B2 31 32

FA retroflexitri-w uwularitri --w tries alveolaritapt-w (3) tra7 dental/tap +w a E(10) 6 alveolaritap twipal alveolaritaplparv td io td 2 7) alveolaritaplvel-w tdko tdh 1 E8 td3 dental tap +wllaryn doh 8 7 t8 2 alveolar-lateral/tap tw tdata alveolar-lateratap twipad ldi (7) 7 alveo-palatal/tap +v retroflexitap+w retroflexitap +wlat retroflexitap twilarynitat E.(7 t : ... alveolarlapprox. tw w WA1 retroflexiapproxt-v alveolarinasalt-w dentatinasalty 10 alveolarinasal-wab 10 dental/nasal+wiasp alveolarinasalt-willong dentallnasal+vlong . O-O ."10 alveolarinasal+wipal tnas dentalinasaitvipal dri i 10 6 alveolarinasalipal-w O 5 tn 3 tnat alweolarinasalt-wiwel trk 8) 4. alveolarinasawel-w trko 5 alveolarinasaltvillaryn th trih 1 dentallnasallaryn-w dinho 5 alveolarinasal-w tro retroflexinasa--w r r1 US 7,319,958 B2 33 34

retroflexinasal-w O 5 palatalplosive-w alveo-palatalplosive-w palatalplosive wimp alveo-palatal/plosive-wlimp 99. 10 5 palatalplosive -wlejec jpt 1 awed-palatalplosive-wiejec 10 jpc. palatalplosive-wasp palatalplosive tw alveo-palatalplosive +w palatalplosive twiprenas alveo-palatailplosive twiprenas alveo-palatalplosive +vinasal palatalplosive +wlimp jb_q b 1 alveo-palatalplosive twimpl gbc. 10 jbig2 palatalplosive +viejec jbig 3 -alveo-palatalplosive twiejec 10 palatalplosive vilasp 19 “velariplosive -w welariplosive-wipreasp weariptosive-wilong is welariplosive-wipal k kpA. weariplosive-wipharyn kpx (6) velariplosive -villaryn kph kpat velariplosive-wilab kpp. kpp. 1 5 velariplosive-wilong?iab kp : 8 welariplosive-winnpl welariptosive -wilablejec velariplosive-wipallejec kpit kpt 1 velariplosive-wiejec kp velariposive-waspfiab velariplosive -vilaspipal velar/plosive-willongasp kpc e. (7)kpc 3 kps2 velariplosive-wiasp k C O velariplosive +w velariplosive vilong kb: kb2 kb3 velarplosive +vipal (7)

US 7,319,958 B2 38

alveo-palataliaffricate -vipal alveo-palatallafricate-Vlve gk. 8 gf8 alveo-palataliaffricate-Villaryn 8 (4) palatallafricate-wlejec alveo-palatallafricate -wlejec 10) gfit 2 alveo-palatallafricate -villablejec 7 palatal/affricate-wasp gic 5 alveo-palatallafricate-Vilasp - - gfc 3 (7) alveo-palatallafricate-Willongiasp gog 7 gfc. 4 alveo-palataliaffricate-Vilaspilab gf CPL 7 alveo-palatal/affricate-wlapsipat gfic palatal/affricate tv alveo-palataliaffricate +v palatal/affricate +vitab alveo-palatallafricate +Vlab alveo-palatallafricate +vlong alveo-palatallafricate tvlprenas ... alveo-palatallafricate twipal alveo-palataliaffricate +vivel alveo-palatallafricate +wlejec : alveo-palataliaffricate tvlasp

welarlaffricate-w velarlafricate -villablejec kfpt kfpt 1 velarfaffricate-wflatlejec kft 8 kft a 1 kfA1 velariaffricate-wiejec 8 (6 km2 velarlafricate-wasp/lab 6 velariaffricate -vlasp kfas 7 velarlafricate tv ky uvularlafricate -w uwularlafricate-wilab 7 uwularlafricate tv palatal/fricative-w alveo-palatatificative-w 1. 2 alveo-palataffricative -willong gs3 alveo-palatalificative-wllab gs p 1 (8) tp14 US 7,319,958 B2 39 40

alveo-palatal/fricative-willong?lab SE 9. 8 gs5 (2 alveo--wipal alveo-palatal/fricative-wivel gsk- 9 palatallfricative-wiejec alveo-palatalificative-wiejec O 10 4) palatallfricative +w alveo-palataffricative tw alveo-palataffricative vilong (8) alveo-palataffricative +viprenas alveo-palata/fricative-wipal alveo-palatal fricative twive gzkit is " ks wearificative -w 4. velarifricative -vilong ks.: ks2 velarificative-wipa. ks 8 kS3 velarificative -wilablejec 6 welarificative-wlejec ks 8 velarifricative-viab velarificative-wab ks :p. 9 3. welarifricative w kz kza welarifricative +vilong K2. 8 kz.2 his velarifricative wipal k2 kzhi (6 (4) velarifricative +villaryn kzh 8 welarifricative viab kzp wularificative -w qsal uvularifricative-willong BS- 8 qs^2 uwularificative-wilablejec qSpt qst 1 (6) uwularificative viejec st 8 uwularifricative-wilab SP qSp 1 uwularificative-vilongilab S: 9 uwularifricative --w uvularificative +willong uvularifricative +wab B pharyngealifricative-w XS s pharyngealifricative-willong xs2 US 7,319,958 B2 41 42

pharyngeal/fricative +v pharyngeatificative +vlong R (9) epiglottaffricative-w csmif his as epiglottalfricative +v 5 5 glottaiffricative-v glottaffricative -vilong 10 hism2 hism3 7 glottal/fricative -villaryn 9 (8) hism. glottalificative-villab hSp E. F. glottaffricative +v palatalinasal+w alveo-palatal/nasalt-w palatal/nasal +vlong gn 10 jna palatalinasaltvillaryn (8) jna4 palatalinasal-w 6

velarinasal +w velarinasaltvillong kn_:. 9 knas velarinasal+vipal knm2 (2) velarinasallpal-w kn id 6 knh 2 (8) kn3 kn4 velarinasal+vlpharyn (7) 6 velarinasaltvillaryn knh welarinasawab in velarinasal-w kno uwularinasal+w palatallat approx.tv palatallat approx+vive A. e. velarlatapprox+y k (7) palatallapproxtv palatal/approx+vlong W_l palataliapprox twinasal ... " palatalapprox-w 7 jwa palataliapprox twillaryn 8 US 7,319,958 B2 43 44

front-ricloseshort O front-ricose-w O 10 front-ricloselvelar O front-riclosepharyn th h front-riclosellaryn front-riclose nasalvelar front-ricloselmasal/laryn i. front-riciosefnasal/long ins front-ricosefnasal 3) front-ricloselong pharyn front-ricloselong front-riclose palatal offglide front-ricloseliabial offglide iW. ij front-riclose?central offglide O

front-ricose front tricloselong front +riclose/palatal offglide front +riclosellabial offglide front friclose?central offglide central-riclosefshort central-ricloserasashort central-closelinasal int central-riclosellong eij 3 6 4. 1. centra-rlclosepalatal offglide ei 2 (6 central-riclosellabial offglide eiwi ei 1 7 central-ricloselcentral offglide t t central-riclose-mid central-riclose-mid-palatal offglide OO ee 1 central-riclose-midilabial offglide leew *1eej 1 B) central-riclose-midcentral offglide eele fromt-riclose-lax h iha1 front-riclose-laxipharyn ih x. 7 front-riclose-laxillong ih ihr A1 US 7,319,958 B2 45 46

front-rlclose-laxinasal ihr. 6 ihw3 front-riclose-lax/palatal offglide sh 26) in 4 front-riclose-laxilabial offglide ihw ih w 1 ih 1 (6) 5 front-rfclose-lax/central offglide ihe 8 7 front-riclose-lax front +rickose-laxpalatal offglide h yha1 front +riclose-laxlabial offglide front triclose-laxilabial offglide hel 8 7 front-riclose-mid front-rlclose-?mid short E. E.10 je2 front-rlclose-midbreathy iec- ielo 1 (10) ie3 front-riclose-mid-w e O 10 (9) front-riclose-midirett is front-riclose-mid-pharyn front-riclose-midlaryn 2.

front-rkclose-mid/nasalaryn front-riclose-middnasal/long front-riclose-midnasal e6 ing front-riclose-mid/long e - (5) 1. front-rlclose-mid palatal offglide - iej 2 front-riclose-mid/labial offglide ie W. ie 1 (6) front-riclose-midcentral offglide feel 6 front +rickose-mid front +fclose-midfshort es 10 front-rlclose-midnasal ye2 front +riclose-mid/long ye 5 front +riclose-mid/palatal offglide ye L2 (6) front-triclose-mid/labial offglide yew. Y- yee 1 ye 1 7 loa5. front+riclose-mid central offglide se 8 8 (2) front-floper-mid front-rlopen-midfshort 10 io2 front-riopen-mid/laryn front-rfopen-mid/nasallong front-flopen-mid/nasal 5 front-?topen-mid/long id: 2 front-riopen-midipalatal offglide io: 1 (6) ioma front-riopen-mid/labial offglide io w io w A1 iow 2 (6) 5 US 7,319,958 B2 47 48 front-riopen-midicentral offglide io e (7) 7 front +riopen-mid yo yo 1 front+riopen-midlong front+riopen-midnasal yoA2 front +rloper-mid palatal offglide front +riopen-mid/labia offglide yow 1 yo 1 7 front+riopen-midcentral offglide de 8 8 ea 1 central-riopen central-riopen/short eas 10 ea2 central-riopen/breathy central-riopen-w ead 10 ea4 central-riopeniretr 8 central-riopenipharyn central-riopellaryn eah " central-riopeninasalipharyn earx ea -x 1 eas central-riopeninasal/longipharyn ear:x 8 " 3 centra-flopeninasalaryn ea - 2 central-riopeninasalong 8 eas central-riopeninasal 8a. ea 3 7 eaf central-rlopenlong pharyn (4) 3. central-riopenlong ea eat central-riopenipalatal offglide 5 ea 2 central-rioper/labial offglide central-rlopenicentral offglide eae 8 central+riopen back triopen back +riopenishort E. Its10 back +riopennasal t3 ua2 back triopenlong (7. back +riopenpalata offglide uaw 4 back triopenllabial offglide back +rlopercentral offglide U38: Rei8 - aa2 back-flopen 5 back-?loper/nasalong ala ^ aar 1 back-riopen/nasal aa is (6) aa1 back-?lopenlong aaw. A 5) US 7,319,958 B2 49 50 back-ropenpalatal offglide aa w 35 back-rlopenllabial offglide eas back-riopeicentral offglide as: 8 8 ----7 4 central-rlopen2-mid central-rlopen2-midshort C. S. O ec2 central-roper2-mid-pharyn 8 central-rlopen2-midlpalatal offglide ec3 central-rlopen2-mid/labial offglide central-rlopen2-midcentral offglide : -'l'-. front-riopen2-mid front-riopen2-midpharyn ic-X 8 4. front-rlopen2-mid/nasalong front-rfopen2-midinasa front-riopen2-mid/long front-rlopen2-midtpalatat offglide icy 3 front-rlopen2-mid/labiat offglide front-rlopen2-midcentral offglide O E9 8 eagea10 central-rimid centra-rmidfshort 10 9 central-rimidinasalong ex3 central-firinidnasal - E-" (7) central-rimid/long er 5 central-firmirer 8 er2 er3 era. 5 central-rimid palatal offglide O 7 central-rimid-labial offglide ex4 central-rimidfcentral offglide 7 central-imid central rinidishort 10) ex5 central+timidlong x2 7 central+rimid palatal offglide oxi central trimidilabiat offglide oxw ox w 2 central +ffmid central offglide OX 8 - 7 central-riopen-mid central-rfopen-mid/nasal centra-flopen-midipalatal offglide eo Weo1 centra-flopen-mid/labial offglide eow eo w 1eo w 28 8 US 7,319,958 B2 51 52

central-rlopen-midlcentral offglide eole 9 8 ed2 7 central Friopen-mid central +rlopen-midlpalatal offglide too oom.1 central+rlopen-mid/central offglide looe

bilabial approx + willong O 8) pw2 labio-dentalapprox *w fw fwa labio-palatallapproxtv bw3 abio-velarlapprox +v bw 6 labio-velarlapprox twflaryn labio-velarlapprox -v labio-velarlapproxtvlong bwo a1bwhbwa1 bw2 slabio-velarlapprox+vinasal |- (6) (6) (6) (5) welarlapprox+wlinasal kw. velariapprox+v kw back icose

back +ricloseshort uis 9 u3 back triclose/breathy ulc l" 9 back triclosepharyn uth 1 back +riclosellaryn back+ricloselinasalong ui - A 1. uiA.

back +ficioselinasal it." 5 back triclosellong/pharyn, : . back tricloselkong uias back +riclosepalatal offglide back +riclosellabial offglide back +riclose?central offglide uis (4) central +riclose central tricloseshort central+riclose/long in (4) US 7,319,958 B2 53 54

centra-riclose-mid

back-ficose:short O-O 10 back-riciosefnasa back-ricloselong back-riclose palatal offglide O aw 3 at W4 back-ricloselicentral offglide ae 9. back +ticlose-lax uh back-riclose-laxlong pharyn 8 8 back -icose-lax nasal th-R back +ficlose-laxillong back +ficlose-laxpalatal offglide h uh W3 back +riclose-laxilabial offglide shw1 ush w 27 back triclose-lax central offglide 8 back-ficose-mid ue^1 back-ricose-midfshort 10 e2 ue^3 -back +riclose-midbreathy ue of (10) 8 back triclose-midraryn s i8 ing 2 (1) back triclose-mid/nasalaryn : ue^4 6 back-riclose-midfrasal back-triclose-midlong pharyn back triclose-midilong back +riclose-mid palatal offglide back triclose-mid/labial offglide back triclose-mid-central offglide back-rlclose-mid ae back-close-midrasa ae3 86 back-ricose-mid palatal offglide O 6 2 back-riclose-midlabial offglide aew aewae w? back-riclose-midcentral offglide aee 8 7 US 7,319,958 B2 55 56 back +riopen-mid back +riopen-midfshort back +rfopen-mid/pharyn back +riopen-mid/laryn back+riopen-midinasalong/pharyn back +rioper-midlinasalong back triopen-midinasal back +riopen-midilong back+riopen-mid palatal offglide back +riopen-mid/labial offglide back+riopen-midcentral offglide

back-rlopen-mid back-rlopen-midnasal back-flopen-mid palatal offglide ao w 3ao.1 back-flopen-mid/labial offigide 8 (8) back-flopen-mid central offglide aOe (9) (8) US 7,319,958 B2 57 58 We claim: phone correspondence network comprising at least one 1. A method to provide a polyphone network, comprising: binary branching tree having at least one level of non Selecting a plurality of spoken languages; merged phones and at least five potential levels of hierar identifying at least Some phones for each of the plurality chically merged phone nodes. of spoken languages; 15. The method of claim 14 wherein providing a hierar providing a polyphone network that is a hierarchical chical phone correspondence network comprising at least phone correspondence network comprising the at least one binary branching tree having at least one level of Some phones, and that includes a low resolution level non-merged phones and at least five potential levels of of phone information, wherein the phone information hierarchically merged phone nodes includes providing at of the low resolution level is determined from corre 10 least some merged phone nodes that each comprise a merger sponding higher resolution phone information levels as of a plurality of phone categories, which phone categories a function of at least: are themselves each most representative of a given plurality articulatorily defined phonetic similarity as between the of phone categories. higher resolution phone information level phones; 16. The apparatus of claim 15 wherein the polyphone and 15 network is represented by standard, printable ASCII char at least one language-independent phonological factor. acters, none of which comprises a special character com 2. The method of claim 1, wherein selecting a plurality of monly having a corresponding computer operation com spoken languages comprises selecting a plurality of spoken mand meaning nor to which Unix command line interpreters languages that includes at least two languages that are prescribe special significance, such that the polyphone net Substantially typologically diverse with respect to one work only requires a standard font set and Substantially another. avoids programming collisions. 3. The method of claim 2, wherein selecting a plurality of 17. The method of claim 1 wherein the at least one spoken languages that includes at least two languages that language-independent phonological factor comprises at least are substantially typologically diverse with respect to one one of language-independent phonological contrast, cross another includes selecting at least five percent of all cur 25 linguistic phone frequency, predetermined linguistic tenden rently spoken languages. cies, and predetermined linguistic universals. 4. The method of claim 3 wherein selecting at least five percent of all currently spoken languages includes selecting 18. The method of claim 1 wherein providing a polyphone at least twenty-five percent of all currently spoken lan network includes providing a polyphone network that is guages. 30 Substantially unbiased towards any particular language. 5. The method of claim 1 wherein identifying at least 19. The method of claim 18 wherein providing a poly some phones for each of the plurality of spoken languages phone network includes providing a polyphone network that includes identifying at least some phonemes that are Sub is Substantially unbiased towards any particular language stantially unique to each of at least some of the plurality of family. spoken languages. 35 20. The method of claim 19 wherein providing a poly 6. The method of claim 5 wherein identifying at least phone network includes providing a polyphone network that Some phonemes that are substantially unique to each of at is Substantially unbiased towards any particular language least Some of the plurality of spoken languages includes type. identifying Substantially all unique phonemes for Substan 21. The method of claim 1 wherein identifying at least tially all of the plurality of spoken languages. 40 Some phones for each of the plurality of spoken languages 7. The method of claim 1 wherein providing a hierarchical includes representing the phones using a transcription sys phone correspondence network includes organizing merged tem that comprises only standard, printable ASCII charac phone selections in at least one tree. ters, none of which comprise a special character commonly 8. The method of claim 1 wherein providing a hierarchical having a corresponding computer operation command phone correspondence network includes organizing merged 45 meaning nor to which Unix command line interpreters phone selections in at least one binary branching tree. prescribe special significance. Such that the transcription 9. The method of claim 1 wherein providing a polyphone system characters only require a standard font set and network includes providing a higher resolution view of Substantially avoid programming collisions. phone information. 22. The method of claim 1 where identifying at least some 10. The method of claim 9 wherein providing a higher 50 phones includes representing the phones using a transcrip resolution view of phone information includes providing a tion system that comprises only standard, printable ASCII higher resolution view of phone information wherein at least characters, none of which comprise a special character Some of the phone information comprises phonemes for at commonly having a corresponding computer operation com least some of the plurality of spoken languages. mand meaning nor to which Unix command line interpreters 11. The method of claim 9 wherein providing a polyphone 55 prescribe special significance. Such that the transcription network further includes providing a high resolution view of system characters only require a standard font set and phone information that is a less high resolution view than the Substantially avoid programming collisions. higher resolution view of phone information. 12. The method of claim 9 wherein providing a polyphone 23. An apparatus comprising: network further includes providing a lower resolution view 60 a speech recognition engine; of phone information comprised of phone categories. a memory operable coupled to the speech recognition 13. The method of claim 9 wherein providing a high engine and containing a set of models selected as a resolution view of phone information includes providing a function, at least in part, of a polyphone network that is binary branching tree hierarchy. a hierarchical phone correspondence network com 14. The method of claim 13 wherein providing a hierar 65 prised of a plurality of phones, which phones corre chical phone correspondence network comprising at least spond to phones as used in a plurality of spoken one binary branching tree includes providing a hierarchical languages and wherein relationships between the US 7,319,958 B2 59 60 phones at lower resolution levels are represented hier 24. The apparatus of claim 23 wherein the apparatus archically by nodes that correspond to merged phones comprises a wireless communications device. from higher resolution levels, wherein merging of the 25. The apparatus of claim 23 wherein the wireless phones to form the nodes is based, at least in part, upon communications device comprises a two-way wireless com articulatory phonetic similarity as between the phones 5 munications device. and at least one language-independent phonological factor.