Grapheme-To-Phoneme Models for (Almost) Any Language

Total Page:16

File Type:pdf, Size:1020Kb

Grapheme-To-Phoneme Models for (Almost) Any Language Grapheme-to-Phoneme Models for (Almost) Any Language Aliya Deri and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {aderi, knight}@isi.edu Abstract lang word pronunciation eng anybody e̞ n iː b ɒ d iː Grapheme-to-phoneme (g2p) models are pol żołądka z̻owon̪t̪ka rarely available in low-resource languages, ben শ嗍 s̪ ɔ k t̪ ɔ as the creation of training and evaluation ʁ a l o m o t חלומות heb data is expensive and time-consuming. We use Wiktionary to obtain more than 650k Table 1: Examples of English, Polish, Bengali, word-pronunciation pairs in more than 500 and Hebrew pronunciation dictionary entries, with languages. We then develop phoneme and pronunciations represented with the International language distance metrics based on phono- Phonetic Alphabet (IPA). logical and linguistic knowledge; apply- ing those, we adapt g2p models for high- word eng deu nld resource languages to create models for gift ɡ ɪ f tʰ ɡ ɪ f t ɣ ɪ f t related low-resource languages. We pro- class kʰ l æ s k l aː s k l ɑ s vide results for models for 229 adapted lan- send s e̞ n d z ɛ n t s ɛ n t guages. Table 2: Example pronunciations of English words 1 Introduction using English, German, and Dutch g2p models. Grapheme-to-phoneme (g2p) models convert words into pronunciations, and are ubiquitous in For most of the world’s more than 7,100 lan- speech- and text-processing systems. Due to the guages (Lewis et al., 2009), no data exists and the diversity of scripts, phoneme inventories, phono- many technologies enabled by g2p models are in- tactic constraints, and spelling conventions among accessible. the world’s languages, they are typically language- Intuitively, however, pronouncing an unknown specific. Thus, while most statistical g2p learning language should not necessarily require large methods are language-agnostic, they are trained on amounts of language-specific knowledge or data. language-specific data—namely, a pronunciation A native German or Dutch speaker, with no knowl- dictionary consisting of word-pronunciation pairs, edge of English, can approximate the pronuncia- as in Table 1. tions of an English word, albeit with slightly differ- Building such a dictionary for a new language is ent phonemes. Table 2 demonstrates that German both time-consuming and expensive, because it re- and Dutch g2p models can do the same. quires expertise in both the language and a notation Motivated by this, we create and evaluate g2p system like the International Phonetic Alphabet, models for low-resource languages by adapting ex- applied to thousands of word-pronunciation pairs. isting g2p models for high-resource languages us- Unsurprisingly, resources have been allocated only ing linguistic and phonological information. To fa- to the most heavily-researched languages. Global- cilitate our experiments, we create several notable Phone, one of the most extensive multilingual text data resources, including a multilingual pronunci- and speech databases, has pronunciation dictionar- ation dictionary with entries for more than 500 lan- 1 ies in only 20 languages (Schultz et al., 2013) . guages. 1We have been unable to obtain this dataset. The contributions of this work are: 399 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 399–408, Berlin, Germany, August 7-12, 2016. c 2016 Association for Computational Linguistics • Using data scraped from Wiktionary, we trainingh clean and normalize pronunciation dictionar- ies for 531 languages. To our knowledge, this is the most comprehensive multilingual pro- word g2ph pronh Mh!l pronl nunciation dictionary available. • We synthesize several named entities corpora (a) to create a multilingual corpus covering 384 training languages. h • We develop a language-independent distance metric between IPA phonemes. Figure 1: Strategies Mh!l • We extend previous metrics for language- for adapting existing language distance with additional information language resources and metrics. through output map- word g2ph!l pronl • We create two sets of g2p models for “high ping (a) and training resource” languages: 97 simple rule-based (b) data mapping (b). models extracted from Wikipedia’s “IPA Help” pages, and 85 data-driven models built 3 Method from Wiktionary data. • We develop methods for adapting these g2p Given a low-resource language l without g2p rules models to related languages, and describe re- or training data, we adapt resources (either an sults for 229 adapted models. existing g2p model or a pronunciation dictio- • We release all data and models. nary) from a high-resource language h to create a g2p for l. We assume the existence of two 2 Related Work modules: a phoneme-to-phoneme distance metric Because of the severe lack of multilingual pro- phon2phon, which allows us to map between the nunciation dictionaries and g2p models, different phonemes used by h to the phonemes used by l, methods of rapid resource generation have been and a closest language module lang2lang, which proposed. provides us with related language h. Schultz (2009) reduces the amount of exper- Using these resources, we adapt resources from tise needed to build a pronunciation dictionary, by h to l in two different ways: providing a native speaker with an intuitive rule- • Output mapping (Figure 1a): We use g2ph to generation user interface. Schlippe et al. (2010) pronounce wordl, then map the output to the crawl web resources like Wiktionary for word- phonemes used by l with phon2phon. pronunciation pairs. More recently, attempts have • Training data mapping (Figure 1b): We use been made to automatically extract pronunciation phon2phon to map the pronunciations in dictionaries directly from audio data (Stahlberg et h’s pronunciation dictionary to the phonemes al., 2016). However, the requirement of a na- used by l, then train a g2p model using the tive speaker, web resources, or audio data specific adapted data. to the language still blocks development, and the The next sections describe how we collect number of g2p resources remains very low. Our data, create phoneme-to-phoneme and language- method avoids these issues by relying only on text to-language distance metrics, and build high- data from high-resource languages. resource g2p models. Instead of generating language-specific re- sources, we are instead inspired by research on 4 Data cross-lingual automatic speech recognition (ASR) This section describes our data sources, which are by Vu and Schultz (2013) and Vu et al. (2014), summarized in Table 3. who exploit linguistic and phonetic relationships in low-resource scenarios. Although these works 4.1 Phoible focus on ASR instead of g2p models and rely on audio data, they demonstrate that speech technol- Phoible (Moran et al., 2014) is an online reposi- ogy is portable across related languages. tory of cross-lingual phonological data. We use 400 Phoible Wiki IPA Help tables Wiktionary 1674 languages 97 languages 531 languages 2155 lang. inventories 24 scripts 49 scripts 2182 phonemes 1753 graph. segments 658k word-pron pairs 37 features 1534 phon. segments Wiktionary train Wiktionary test NE data 3389 unique g-p rules 85 languages 501 languages 384 languages 42 scripts 45 scripts 36 scripts 629k word-pron pairs 26k word-pron pairs 9.9m NEs Table 3: Summary of data resources obtained from Phoible, named entity resources, Wikipedia IPA Help tables, and Wiktionary. Note that, although our Wiktionary data technically covers over 500 languages, fewer than 100 include more than 250 entries (Wiktionary train). two of its components: language phoneme inven- glish translation, named entity type, and script in- tories and phonetic features. formation where possible. 4.1.1 Phoneme inventories 4.3 Wikipedia IPA Help tables A phoneme inventory is the set of phonemes To explain different languages’ phonetic notations, used to pronounce a language, represented in IPA. Wikipedia users have created “IPA Help” pages,3 Phoible provides 2156 phoneme inventories for which provide tables of simple grapheme exam- 1674 languages. (Some languages have multiple ples of a language’s phonemes. For example, on inventories from different linguistic studies.) the English page, the phoneme z has the examples “zoo” and “has.” We automatically scrape these 4.1.2 Phoneme feature vectors tables for 97 languages to create simple grapheme- For each phoneme included in its phoneme in- phoneme rules. ventories, Phoible provides information about Using the phon2phon distance metric and map- 37 phonological features, such as whether the ping technique described in Section 5, we clean phoneme is nasal, consonantal, sonorant, or a tone. each table by mapping its IPA phonemes to the lan- Each phoneme thus maps to a unique feature vec- guage’s Phoible phoneme inventory, if it exists. If tor, with features expressed as +, -, or 0. it does not exist, we map the phonemes to valid Phoible phonemes and create a phoneme inventory 4.2 Named Entity Resources for that language. For our language-to-language distance metric, it is 4.4 Wiktionary pronunciation dictionaries useful to have written text in many languages. The most easily accessible source of this data is multi- Ironically, to train data-driven g2p models for lingual named entity (NE) resources. high-resource languages, and to evaluate our We synthesize 7 different NE corpora: Chinese- low-resource g2p models, we require pronunci- English names (Ji et al., 2009), Geonames (Vatant ation dictionaries for many languages. A com- and Wick, 2006), JRC names (Steinberger et al., mon and successful technique for obtaining this 2011), corpora from LDC2, NEWS 2015 (Banchs data (Schlippe et al., 2010; Schlippe et al., et al., 2015), Wikipedia names (Irvine et al., 2012a; Yao and Kondrak, 2015) is scraping Wik- 2010), and Wikipedia titles (Lin et al., 2011); tionary, an open-source multilingual dictionary to this, we also add multilingual Wikipedia titles maintained by Wikimedia.
Recommended publications
  • Grapheme-To-Lexeme Feedback in the Spelling System: Evidence from a Dysgraphic Patient
    COGNITIVE NEUROPSYCHOLOGY, 2006, 23 (2), 278–307 Grapheme-to-lexeme feedback in the spelling system: Evidence from a dysgraphic patient Michael McCloskey Johns Hopkins University, Baltimore, USA Paul Macaruso Community College of Rhode Island, Warwick, and Haskins Laboratories, New Haven, USA Brenda Rapp Johns Hopkins University, Baltimore, USA This article presents an argument for grapheme-to-lexeme feedback in the cognitive spelling system, based on the impaired spelling performance of dysgraphic patient CM. The argument relates two features of CM’s spelling. First, letters from prior spelling responses intrude into sub- sequent responses at rates far greater than expected by chance. This letter persistence effect arises at a level of abstract grapheme representations, and apparently results from abnormal persistence of activation. Second, CM makes many formal lexical errors (e.g., carpet ! compute). Analyses revealed that a large proportion of these errors are “true” lexical errors originating in lexical selec- tion, rather than “chance” lexical errors that happen by chance to take the form of words. Additional analyses demonstrated that CM’s true lexical errors exhibit the letter persistence effect. We argue that this finding can be understood only within a functional architecture in which activation from the grapheme level feeds back to the lexeme level, thereby influencing lexical selection. INTRODUCTION a brain-damaged patient with an acquired spelling deficit, arguing from his error pattern that Like other forms of language processing, written the cognitive system for written word produc- word production implicates multiple levels of tion includes feedback connections from gra- representation, including semantic, orthographic pheme representations to orthographic lexeme lexeme, grapheme, and allograph levels.
    [Show full text]
  • 500 Natural Sciences and Mathematics
    500 500 Natural sciences and mathematics Natural sciences: sciences that deal with matter and energy, or with objects and processes observable in nature Class here interdisciplinary works on natural and applied sciences Class natural history in 508. Class scientific principles of a subject with the subject, plus notation 01 from Table 1, e.g., scientific principles of photography 770.1 For government policy on science, see 338.9; for applied sciences, see 600 See Manual at 231.7 vs. 213, 500, 576.8; also at 338.9 vs. 352.7, 500; also at 500 vs. 001 SUMMARY 500.2–.8 [Physical sciences, space sciences, groups of people] 501–509 Standard subdivisions and natural history 510 Mathematics 520 Astronomy and allied sciences 530 Physics 540 Chemistry and allied sciences 550 Earth sciences 560 Paleontology 570 Biology 580 Plants 590 Animals .2 Physical sciences For astronomy and allied sciences, see 520; for physics, see 530; for chemistry and allied sciences, see 540; for earth sciences, see 550 .5 Space sciences For astronomy, see 520; for earth sciences in other worlds, see 550. For space sciences aspects of a specific subject, see the subject, plus notation 091 from Table 1, e.g., chemical reactions in space 541.390919 See Manual at 520 vs. 500.5, 523.1, 530.1, 919.9 .8 Groups of people Add to base number 500.8 the numbers following —08 in notation 081–089 from Table 1, e.g., women in science 500.82 501 Philosophy and theory Class scientific method as a general research technique in 001.4; class scientific method applied in the natural sciences in 507.2 502 Miscellany 577 502 Dewey Decimal Classification 502 .8 Auxiliary techniques and procedures; apparatus, equipment, materials Including microscopy; microscopes; interdisciplinary works on microscopy Class stereology with compound microscopes, stereology with electron microscopes in 502; class interdisciplinary works on photomicrography in 778.3 For manufacture of microscopes, see 681.
    [Show full text]
  • Etytree: a Graphical and Interactive Etymology Dictionary Based on Wiktionary
    Etytree: A Graphical and Interactive Etymology Dictionary Based on Wiktionary Ester Pantaleo Vito Walter Anelli Wikimedia Foundation grantee Politecnico di Bari Italy Italy [email protected] [email protected] Tommaso Di Noia Gilles Sérasset Politecnico di Bari Univ. Grenoble Alpes, CNRS Italy Grenoble INP, LIG, F-38000 Grenoble, France [email protected] [email protected] ABSTRACT a new method1 that parses Etymology, Derived terms, De- We present etytree (from etymology + family tree): a scendants sections, the namespace for Reconstructed Terms, new on-line multilingual tool to extract and visualize et- and the etymtree template in Wiktionary. ymological relationships between words from the English With etytree, a RDF (Resource Description Framework) Wiktionary. A first version of etytree is available at http: lexical database of etymological relationships collecting all //tools.wmflabs.org/etytree/. the extracted relationships and lexical data attached to lex- With etytree users can search a word and interactively emes has also been released. The database consists of triples explore etymologically related words (ancestors, descendants, or data entities composed of subject-predicate-object where cognates) in many languages using a graphical interface. a possible statement can be (for example) a triple with a lex- The data is synchronised with the English Wiktionary dump eme as subject, a lexeme as object, and\derivesFrom"or\et- at every new release, and can be queried via SPARQL from a ymologicallyEquivalentTo" as predicate. The RDF database Virtuoso endpoint. has been exposed via a SPARQL endpoint and can be queried Etytree is the first graphical etymology dictionary, which at http://etytree-virtuoso.wmflabs.org/sparql.
    [Show full text]
  • ON SOME CATEGORIES for DESCRIBING the SEMOLEXEMIC STRUCTURE by Yoshihiko Ikegami
    ON SOME CATEGORIES FOR DESCRIBING THE SEMOLEXEMIC STRUCTURE by Yoshihiko Ikegami 1. A lexeme is the minimum unit that carries meaning. Thus a lexeme can be a "word" as well as an affix (i.e., something smaller than a word) or an idiom (i.e,, something larger than a word). 2. A sememe is a unit of meaning that can be realized as a single lexeme. It is defined as a structure constituted by those features having distinctive functions (i.e., serving to distinguish the sememe in question from other semernes that contrast with it).' A question that arises at this point is whether or not one lexeme always corresponds to just one serneme and no more. Three theoretical positions are foreseeable: (I) one which holds that one lexeme always corresponds to just one sememe and no more, (2) one which holds that one lexeme corresponds to an indefinitely large number of sememes, and (3) one which holds that one lexeme corresponds to a certain limited number of sememes. These three positions wiIl be referred to as (1) the "Grundbedeutung" theory, (2) the "use" theory, and (3) the "polysemy" theory, respectively. The Grundbedeutung theory, however attractive in itself, is to be rejected as unrealistic. Suppose a preliminary analysis has revealed that a lexeme seems to be used sometimes in an "abstract" sense and sometimes in a "concrete" sense. In order to posit a Grundbedeutung under such circumstances, it is to be assumed that there is a still higher level at which "abstract" and "concrete" are neutralized-this is certainly a theoretical possibility, but it seems highly unlikely and unrealistic from a psychological point of view.
    [Show full text]
  • Multilingual Ontology Acquisition from Multiple Mrds
    Multilingual Ontology Acquisition from Multiple MRDs Eric Nichols♭, Francis Bond♮, Takaaki Tanaka♮, Sanae Fujita♮, Dan Flickinger ♯ ♭ Nara Inst. of Science and Technology ♮ NTT Communication Science Labs ♯ Stanford University Grad. School of Information Science Natural Language ResearchGroup CSLI Nara, Japan Keihanna, Japan Stanford, CA [email protected] {bond,takaaki,sanae}@cslab.kecl.ntt.co.jp [email protected] Abstract words of a language, let alone those words occur- ring in useful patterns (Amano and Kondo, 1999). In this paper, we outline the develop- Therefore it makes sense to also extract data from ment of a system that automatically con- machine readable dictionaries (MRDs). structs ontologies by extracting knowledge There is a great deal of work on the creation from dictionary definition sentences us- of ontologies from machine readable dictionaries ing Robust Minimal Recursion Semantics (a good summary is (Wilkes et al., 1996)), mainly (RMRS). Combining deep and shallow for English. Recently, there has also been inter- parsing resource through the common for- est in Japanese (Tokunaga et al., 2001; Nichols malism of RMRS allows us to extract on- et al., 2005). Most approaches use either a special- tological relations in greater quantity and ized parser or a set of regular expressions tuned quality than possible with any of the meth- to a particular dictionary, often with hundreds of ods independently. Using this method, rules. Agirre et al. (2000) extracted taxonomic we construct ontologies from two differ- relations from a Basque dictionary with high ac- ent Japanese lexicons and one English lex- curacy using Constraint Grammar together with icon.
    [Show full text]
  • A New Rule-Based Method of Automatic Phonetic Notation On
    A New Rule-based Method of Automatic Phonetic Notation on Polyphones ZHENG Min, CAI Lianhong (Department of Computer Science and Technology of Tsinghua University, Beijing, 100084) Email: [email protected], [email protected] Abstract: In this paper, a new rule-based method of automatic 2. A rule-based method on polyphones phonetic notation on the 220 polyphones whose appearing frequency exceeds 99% is proposed. Firstly, all the polyphones 2.1 Classify the polyphones beforehand in a sentence are classified beforehand. Secondly, rules are Although the number of polyphones is large, the extracted based on eight features. Lastly, automatic phonetic appearing frequency is widely discrepant. The statistic notation is applied according to the rules. The paper puts forward results in our experiment show: the accumulative a new concept of prosodic functional part of speech in order to frequency of the former 100 polyphones exceeds 93%, improve the numerous and complicated grammatical information. and the accumulative frequency of the former 180 The examples and results of this method are shown at the end of polyphones exceeds 97%. The distributing chart of this paper. Compared with other approaches dealt with polyphones’ accumulative frequency is shown in the polyphones, the results show that this new method improves the chart 3.1. In order to improve the accuracy of the accuracy of phonetic notation on polyphones. grapheme-phoneme conversion more, we classify the Keyword: grapheme-phoneme conversion; polyphone; prosodic 700 polyphones in the GB_2312 Chinese-character word; prosodic functional part of speech; feature extracting; system into three kinds in our text-to-speech system 1.
    [Show full text]
  • (STAAR®) Dictionary Policy
    STAAR® State of Texas Assessments of Academic Readiness State of Texas Assessments of Academic Readiness (STAAR®) Dictionary Policy Dictionaries must be available to all students taking: STAAR grades 3–8 reading tests STAAR grades 4 and 7 writing tests, including revising and editing STAAR Spanish grades 3–5 reading tests STAAR Spanish grade 4 writing test, including revising and editing STAAR English I, English II, and English III tests The following types of dictionaries are allowable: standard monolingual dictionaries in English or the language most appropriate for the student dictionary/thesaurus combinations bilingual dictionaries* (word-to-word translations; no definitions or examples) E S L dictionaries* (definition of an English word using simplified English) sign language dictionaries picture dictionary Both paper and electronic dictionaries are permitted. However, electronic dictionaries that provide access to the Internet or have photographic capabilities are NOT allowed. For electronic dictionaries that are hand-held devices, test administrators must ensure that any features that allow note taking or uploading of files have been cleared of their contents both before and after the test administration. While students are working through the tests listed above, they must have access to a dictionary. Students should use the same type of dictionary they routinely use during classroom instruction and classroom testing to the extent allowable. The school may provide dictionaries, or students may bring them from home. Dictionaries may be provided in the language that is most appropriate for the student. However, the dictionary must be commercially produced. Teacher-made or student-made dictionaries are not allowed. The minimum schools need is one dictionary for every five students testing, but the state’s recommendation is one for every three students or, optimally, one for each student.
    [Show full text]
  • Methods in Lexicography and Dictionary Research* Stefan J
    Methods in Lexicography and Dictionary Research* Stefan J. Schierholz, Department Germanistik und Komparatistik, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany ([email protected]) Abstract: Methods are used in every stage of dictionary-making and in every scientific analysis which is carried out in the field of dictionary research. This article presents some general consid- erations on methods in philosophy of science, gives an overview of many methods used in linguis- tics, in lexicography, dictionary research as well as of the areas these methods are applied in. Keywords: SCIENTIFIC METHODS, LEXICOGRAPHICAL METHODS, THEORY, META- LEXICOGRAPHY, DICTIONARY RESEARCH, PRACTICAL LEXICOGRAPHY, LEXICO- GRAPHICAL PROCESS, SYSTEMATIC DICTIONARY RESEARCH, CRITICAL DICTIONARY RESEARCH, HISTORICAL DICTIONARY RESEARCH, RESEARCH ON DICTIONARY USE Opsomming: Metodes in leksikografie en woordeboeknavorsing. Metodes word gebruik in elke fase van woordeboekmaak en in elke wetenskaplike analise wat in die woor- deboeknavorsingsveld uitgevoer word. In hierdie artikel word algemene oorwegings vir metodes in wetenskapfilosofie voorgelê, 'n oorsig word gegee van baie metodes wat in die taalkunde, leksi- kografie en woordeboeknavorsing gebruik word asook van die areas waarin hierdie metodes toe- gepas word. Sleutelwoorde: WETENSKAPLIKE METODES, LEKSIKOGRAFIESE METODES, TEORIE, METALEKSIKOGRAFIE, WOORDEBOEKNAVORSING, PRAKTIESE LEKSIKOGRAFIE, LEKSI- KOGRAFIESE PROSES, SISTEMATIESE WOORDEBOEKNAVORSING, KRITIESE WOORDE- BOEKNAVORSING, HISTORIESE WOORDEBOEKNAVORSING, NAVORSING OP WOORDE- BOEKGEBRUIK 1. Introduction In dictionary production and in scientific work which is carried out in the field of dictionary research, methods are used to reach certain results. Currently there is no comprehensive and up-to-date documentation of these particular methods in English. The article of Mann and Schierholz published in Lexico- * This article is based on the article from Mann and Schierholz published in Lexicographica 30 (2014).
    [Show full text]
  • Sounds to Graphemes Guide
    David Newman – Speech-Language Pathologist Sounds to Graphemes Guide David Newman S p e e c h - Language Pathologist David Newman – Speech-Language Pathologist A Friendly Reminder © David Newmonic Language Resources 2015 - 2018 This program and all its contents are intellectual property. No part of this publication may be stored in a retrieval system, transmitted or reproduced in any way, including but not limited to digital copying and printing without the prior agreement and written permission of the author. However, I do give permission for class teachers or speech-language pathologists to print and copy individual worksheets for student use. Table of Contents Sounds to Graphemes Guide - Introduction ............................................................... 1 Sounds to Grapheme Guide - Meanings ..................................................................... 2 Pre-Test Assessment .................................................................................................. 6 Reading Miscue Analysis Symbols .............................................................................. 8 Intervention Ideas ................................................................................................... 10 Reading Intervention Example ................................................................................. 12 44 Phonemes Charts ................................................................................................ 18 Consonant Sound Charts and Sound Stimulation ....................................................
    [Show full text]
  • Part 1: Introduction to The
    PREVIEW OF THE IPA HANDBOOK Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet PARTI Introduction to the IPA 1. What is the International Phonetic Alphabet? The aim of the International Phonetic Association is to promote the scientific study of phonetics and the various practical applications of that science. For both these it is necessary to have a consistent way of representing the sounds of language in written form. From its foundation in 1886 the Association has been concerned to develop a system of notation which would be convenient to use, but comprehensive enough to cope with the wide variety of sounds found in the languages of the world; and to encourage the use of thjs notation as widely as possible among those concerned with language. The system is generally known as the International Phonetic Alphabet. Both the Association and its Alphabet are widely referred to by the abbreviation IPA, but here 'IPA' will be used only for the Alphabet. The IPA is based on the Roman alphabet, which has the advantage of being widely familiar, but also includes letters and additional symbols from a variety of other sources. These additions are necessary because the variety of sounds in languages is much greater than the number of letters in the Roman alphabet. The use of sequences of phonetic symbols to represent speech is known as transcription. The IPA can be used for many different purposes. For instance, it can be used as a way to show pronunciation in a dictionary, to record a language in linguistic fieldwork, to form the basis of a writing system for a language, or to annotate acoustic and other displays in the analysis of speech.
    [Show full text]
  • Parietal Dysgraphia: Characterization of Abnormal Writing Stroke Sequences, Character Formation and Character Recall
    Behavioural Neurology 18 (2007) 99–114 99 IOS Press Parietal dysgraphia: Characterization of abnormal writing stroke sequences, character formation and character recall Yasuhisa Sakuraia,b,∗, Yoshinobu Onumaa, Gaku Nakazawaa, Yoshikazu Ugawab, Toshimitsu Momosec, Shoji Tsujib and Toru Mannena aDepartment of Neurology, Mitsui Memorial Hospital, Tokyo, Japan bDepartment of Neurology, Graduate School of Medicine, University of Tokyo, Tokyo, Japan cDepartment of Radiology, Graduate School of Medicine, University of Tokyo, Tokyo, Japan Abstract. Objective: To characterize various dysgraphic symptoms in parietal agraphia. Method: We examined the writing impairments of four dysgraphia patients from parietal lobe lesions using a special writing test with 100 character kanji (Japanese morphograms) and their kana (Japanese phonetic writing) transcriptions, and related the test performance to a lesion site. Results: Patients 1 and 2 had postcentral gyrus lesions and showed character distortion and tactile agnosia, with patient 1 also having limb apraxia. Patients 3 and 4 had superior parietal lobule lesions and features characteristic of apraxic agraphia (grapheme deformity and a writing stroke sequence disorder) and character imagery deficits (impaired character recall). Agraphia with impaired character recall and abnormal grapheme formation were more pronounced in patient 4, in whom the lesion extended to the inferior parietal, superior occipital and precuneus gyri. Conclusion: The present findings and a review of the literature suggest that: (i) a postcentral gyrus lesion can yield graphemic distortion (somesthetic dysgraphia), (ii) abnormal grapheme formation and impaired character recall are associated with lesions surrounding the intraparietal sulcus, the symptom being more severe with the involvement of the inferior parietal, superior occipital and precuneus gyri, (iii) disordered writing stroke sequences are caused by a damaged anterior intraparietal area.
    [Show full text]
  • Wiktionary Matcher
    Wiktionary Matcher Jan Portisch1;2[0000−0001−5420−0663], Michael Hladik2[0000−0002−2204−3138], and Heiko Paulheim1[0000−0003−4386−8195] 1 Data and Web Science Group, University of Mannheim, Germany fjan, [email protected] 2 SAP SE Product Engineering Financial Services, Walldorf, Germany fjan.portisch, [email protected] Abstract. In this paper, we introduce Wiktionary Matcher, an ontology matching tool that exploits Wiktionary as external background knowl- edge source. Wiktionary is a large lexical knowledge resource that is collaboratively built online. Multiple current language versions of Wik- tionary are merged and used for monolingual ontology matching by ex- ploiting synonymy relations and for multilingual matching by exploiting the translations given in the resource. We show that Wiktionary can be used as external background knowledge source for the task of ontology matching with reasonable matching and runtime performance.3 Keywords: Ontology Matching · Ontology Alignment · External Re- sources · Background Knowledge · Wiktionary 1 Presentation of the System 1.1 State, Purpose, General Statement The Wiktionary Matcher is an element-level, label-based matcher which uses an online lexical resource, namely Wiktionary. The latter is "[a] collaborative project run by the Wikimedia Foundation to produce a free and complete dic- tionary in every language"4. The dictionary is organized similarly to Wikipedia: Everybody can contribute to the project and the content is reviewed in a com- munity process. Compared to WordNet [4], Wiktionary is significantly larger and also available in other languages than English. This matcher uses DBnary [15], an RDF version of Wiktionary that is publicly available5. The DBnary data set makes use of an extended LEMON model [11] to describe the data.
    [Show full text]