Introducing Libera for Japanese
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Proceedings of the Workshop on Multilingual Language Resources and Interoperability, Pages 1–8, Sydney, July 2006
COLING •ACL 2006 MLRI’06 Multilingual Language Resources and Interoperability Proceedings of the Workshop Chairs: Andreas Witt, Gilles Sérasset, Susan Armstrong, Jim Breen, Ulrich Heid and Felix Sasaki 23 July 2006 Sydney, Australia Production and Manufacturing by BPA Digital 11 Evans St Burwood VIC 3125 AUSTRALIA c 2006 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 1-932432-82-5 ii Table of Contents Preface .....................................................................................v Organizers . vii Workshop Program . ix Lexical Markup Framework (LMF) for NLP Multilingual Resources Gil Francopoulo, Nuria Bel, Monte George, Nicoletta Calzolari, Monica Monachini, Mandy Pet and Claudia Soria . 1 The Role of Lexical Resources in CJK Natural Language Processing Jack Halpern . 9 Towards Agent-based Cross-Lingual Interoperability of Distributed Lexical Resources Claudia Soria, Maurizio Tesconi, Andrea Marchetti, Francesca Bertagna, Monica Monachini, Chu-Ren Huang and Nicoletta Calzolari. .17 The LexALP Information System: Term Bank and Corpus for Multilingual Legal Terminology Consolidated Verena Lyding, Elena Chiocchetti, Gilles Sérasset and Francis Brunet-Manquat . 25 The Development of a Multilingual Collocation Dictionary Sylviane Cardey, Rosita Chan and Peter Greenfield. .32 Multilingual Collocation Extraction: Issues and Solutions Violeta Seretan and Eric Wehrli . 40 Structural Properties of Lexical Systems: Monolingual and Multilingual Perspectives Alain Polguère . 50 A Fast and Accurate Method for Detecting English-Japanese Parallel Texts Ken’ichi Fukushima, Kenjiro Taura and Takashi Chikayama . 60 Evaluation of the Bible as a Resource for Cross-Language Information Retrieval Peter A. -
Proceedings of the Workshop on Multilingual Language Resources and Interoperability, Pages 9–16, Sydney, July 2006
The Role of Lexical Resources in CJK Natural Language Processing Jack Halpern(春遍雀來) The CJK Dictionary Institute (CJKI) (日中韓辭典研究所) 34-14, 2-chome, Tohoku, Niiza-shi, Saitama 352-0001, Japan [email protected] Abstract 7. Chinese and Japanese proper nouns, which are The role of lexical resources is often un- very numerous, are difficult to detect without a derstated in NLP research. The complex- lexicon. ity of Chinese, Japanese and Korean 8. Automatic recognition of terms and their vari- (CJK) poses special challenges to devel- ants (Jacquemin 2001). opers of NLP tools, especially in the area of word segmentation (WS), information The various attempts to tackle these tasks by retrieval (IR), named entity extraction statistical and algorithmic methods (Kwok 1997) (NER), and machine translation (MT). have had only limited success. An important mo- These difficulties are exacerbated by the tivation for such methodology has been the poor lack of comprehensive lexical resources, availability and high cost of acquiring and main- especially for proper nouns, and the lack taining large-scale lexical databases. of a standardized orthography, especially This paper discusses how a lexicon-driven ap- in Japanese. This paper summarizes some proach exploiting large-scale lexical databases of the major linguistic issues in the de- can offer reliable solutions to some of the princi- velopment NLP applications that are de- pal issues, based on over a decade of experience pendent on lexical resources, and dis- in building such databases for NLP applications. cusses the central role such resources should play in enhancing the accuracy of 2 Named Entity Extraction NLP tools. -
Display of Japanese Language Features Within Ecoa Measures
Display of Japanese Language Features within eCOA Measures: Challenges and Recommendations Authors: Jonathan Norman, BA (Hons); Naoto Hasegawa, BA; Matthew Blackall, BA; Alisa Heinzman, MFA; Tim Poepsel, PhD; Rachna Kaul, MPA; Brittanie Newton, BA; Elizabeth McCullough, MA; Shawn McKown, MA OBJECTIVE DISCUSSION According to the World Health Organisation, after the US and China, Japan is home to the third highest number of clinical trials in the Kanji appearing with Chinese strokes rather than Japanese strokes (which RWS Life Sciences found to be the case in 57% of our world1. In fact, the number of trials being conducted in Japan increased by over 6,000% from 2001 (n=83) to 2017 (n=5,305). As a result convenience sample) is often caused by Chinese and Japanese eCOA builds being programmed to use the same font. Where a character of this, the use of Japanese Clinical Outcome Assessments (COAs) has become increasingly commonplace. The objective of this study only appears in Japanese, the system displays the character correctly as there is no other option. However, where a character appears was to describe and analyse two of the main challenges associated with the display of Japanese language features in electronic COAs in both Japanese and Chinese (as is the case with Kanji), some fonts will use the Chinese version only meaning that the character displays (eCOAs) and present recommendations for their resolution. incorrectly for Japan. Although Kanji characters displayed using Chinese strokes are understandable to a Japanese-speaking audience, it’s important to BACKGROUND remember how a COA is interpreted can impact the way certain respondents will interact with it. -
Chinese, Dutch, and Japanese in the Introduction of Western Learning in Tokugawa Japan
_full_alt_author_running_head (neem stramien B2 voor dit chapter en dubbelklik nul hierna en zet 2 auteursnamen neer op die plek met and): 0 _full_articletitle_deel (kopregel rechts, vul hierna in): Polyglot Translators _full_article_language: en indien anders: engelse articletitle: 0 62 Heijdra Chapter 6 Polyglot Translators: Chinese, Dutch, and Japanese in the Introduction of Western Learning in Tokugawa Japan Martin J. Heijdra The life of an area studies librarian is not always excitement. Yes, one enjoys informing bright graduate students of the latest scholarship, identifying Chi- nese rubbings of Egyptological stelae, or discussing publishing gaps in the cur- rent scholarship with knowledgeable editors; but it involves sometimes the mundane, such as reshelving a copy of a nineteenth-century Japanese transla- tion of a medical work by Johannes de Gorter.1 It was while performing the latter duty that I noticed something odd. The characters used to write Gorter were 我爾德兒, which indeed could be read as Gorter. That is, if read in modern Chinese; if read in the usual Sino-Japanese, it would be *Gajitokuji, something far from the Dutch pronunciation. A quick perusal of some scholars of rangaku 蘭學, “Dutch Studies,” revealed a general lack of awareness of this question, why a Dutch name in a nineteenth-century Japanese book would be read in modern Chinese. Prompted to write an article in honor of a Dutch editor of East and South Asian Studies, I decided to inves- tigate this more thoroughly. There are many aspects to consider, and I must confess that the final reason is hard to come by; but while I have not reached a final conclusion, I hope that in the future scholars will at least recognize the phenomenon when encountered. -
The Logical Linguistics
International Journal of Latest Research in Humanities and Social Science (IJLRHSS) Volume 01 - Issue 11, www.ijlrhss.com || PP. 10-48 The Logical Linguistics: Logic of Phonemes and Morae Contained in Speech Sound Stream and the Logic of Dichotomy and Dualism equipped in Neurons and Mobile Neurons Inside the Cerebrospinal Fluid in Ventricular System Interacting for the Unconscious Grammatical Processing and Constitute Meaning of Concepts. (A Molecular Level Bio-linguistic Approach) Kumon “Kimiaki” TOKUMARU Independent Researcher Abstract: This is an interdisciplinary integration on the origins of modern human language as well as the in- brain linguistic processing mechanism. In accordance with the OSI Reference Model, the author divides the entire linguistic procedure into logical layer activities inside the speaker‟s and listener‟s brains, and the physical layer phenomena, i.e. the speech sound stream or equivalent coded signals such as texts, broadcasting, telecommunications and bit data. The entire study started in 2007, when the author visitedthe Klasies River Mouth (KRM) Caves in South Africafor the first time, the oldest modern human site in the world.The archaeological excavation in 1967- 68 (Singer Wymer 1982) confirmed that (anatomically) modern humanshad lived inside the caves duringthe period120 – 60 Ka (thousand years ago), which overlaps with the timing of language acquisition, 75 Ka (Shima 2004). Following this first visitIstarted an interdisciplinary investigation into linguistic evolution and the in- brain processing mechanism of language through reading various books and articles. The idea that the human language is digital was obtained unexpectedly. The Naked-Mole Rats (Heterocephalus glaber) are eusocial and altruistic. They live underground tunnel in tropical savanna of East Africa. -
Exaptation, Grammaticalization, and Reanalysis
Heiko Narrog Tohoku University Exaptation, Grammaticalization, and Reanalysis Abstract The goal of this paper is to argue that exaptation, as introduced into the study of language change by Lass (1990, 1997), in specific functional domains, is a limited alternative to grammaticalization. Exaptation, similarly to grammaticalization, leads to the formation of grammatical elements. Like grammaticalization, exaptation is based on the mechanism of reanalysis. It decisively differs from grammaticalization, however, as it implies change in the opposite direction, namely from material absorbed in the lexicon back to grammatical material. Two sets of data are presented as evidence for the replicability of this process. One involves the occurrence of exaptation across languages in a specific semantic domain, namely, the evolution of morphological causatives out of lexical verb patterns. The other data pertain to recurrent processes of exaptation in one language, namely in Japanese, where exaptation figures in the development of various morphological categories. In all cases of exaptation, reanalysis is crucially involved. This serves to show that reanalysis may be more fundamental to grammatical change than both grammaticalization and exaptation. Furthermore, it allows for change both with the usual directionality of grammaticalization and against it. 1. Introduction Even detractors of grammaticalization theory do not seriously challenge the fact that in the majority of cases the morphosyntactic and semantic development of grammatical material follows the paths outlined in the standard literature on grammaticalization. The two following California Linguistic Notes Volume XXXII No. 1 Winter, 2007 2 issues, however, potentially pose a critical challenge to the validity of the theory. First, there is the question of the theoretical status of grammaticalization as a coherent and unique concept. -
<Strong><Em>English Grammar for Students of Japanese
UCLA Issues in Applied Linguistics Title English Grammar for Students of Japanese (The Study Guide for Those Learning Japanese) by Mutsuko Endo Hudson. Ann Arbor: The Olivia and Hill Press, 1994 vii + 204 pp. Permalink https://escholarship.org/uc/item/29p5w1zb Journal Issues in Applied Linguistics, 5(2) ISSN 1050-4273 Author Mishina, Satomi Publication Date 1994-12-30 DOI 10.5070/L452005194 Peer reviewed eScholarship.org Powered by the California Digital Library University of California English Grammar for Students of Japanese (The Study Guide for Those Learning Japanese) by Mutsuko Endo Hudson. Ann Arbor: The Olivia and Hill Press, 1994 vii + 204 pp. Reviewed by Satomi Mishina University of California, Los Angeles English Grammar for Students ofJapanese provides a concise explanation of the key concepts and terminology of English and Japanese grammar. The title of this book may be somewhat misleading, since Hudson does not necessarily emphasize the grammar of English, but rather affords equal emphasis to the grammars of both English and Japanese. The description of English grammar and the contrastive presentation of the two grammar systems are intended to help students learning Japanese to understand its basic grammatical notions in hght of the grammar of their own native language, which, the author assumes, faciUtates the understanding of a foreign language grammar. The grammar points are addressed in separate chapters in the following basic order: parts of speech (e.g., nouns, verbs), inflections, various sentence level phenomena (e.g., subject, topic), sentence type (e.g., affirmative vs. negative, declarative vs. interrogative), tense (e.g., present tense, past tense), voice (e.g., active, passive), and types of clauses (e.g., conditional clauses, relative clauses). -
Porting Grammars Between Typologically Similar Languages: Japanese to Korean Roger KIM Mary DALRYMPLE Palo Alto Research Center Dept
Porting Grammars between Typologically Similar Languages: Japanese to Korean Roger KIM Mary DALRYMPLE Palo Alto Research Center Dept. of Computer Science 3333 Coyote Hill Rd. King's College London Palo Alto, CA 94304 USA Strand, London WC2R 2LS UK [email protected] [email protected] Ronald M. KAPLAN Tracy Holloway KING Palo Alto Research Center Palo Alto Research Center 3333 Coyote Hill Rd. 3333 Coyote Hill Rd. Palo Alto, CA 94304 USA Palo Alto, CA 94304 USA [email protected] [email protected] Abstract We report on a preliminary investigation of the dif®culty of converting a grammar of one lan- guage into a grammar of a typologically similar language. In this investigation, we started with the ParGram grammar of Japanese and used that as the basis for a grammar of Korean. The re- sults are encouraging for the use of grammar porting to bootstrap new grammar development. 1 Introduction The Parallel Grammar project (ParGram) is an international collaboration aimed at producing broad-cov- erage computational grammars for a variety of languages (Butt et al., 1999; Butt et al., 2002). The gram- mars (currently of English, French, German, Japanese, Norwegian, and Urdu) are written in the frame- work of Lexical Functional Grammar (LFG) (Kaplan and Bresnan, 1982; Dalrymple, 2001), and they are constructed using a common engineering and high-speed processing platform for LFG grammars, the XLE (Maxwell and Kaplan, 1993). These grammars, as do all LFG grammars, assign two levels of syntac- tic representation to the sentences of a language: a super®cial phrase structure tree (called a constituent structure or c-structure) and an underlying matrix of features and values (the functional structure or f- structure). -
Jack Halpern (Hg.): New ]Apanese-English Character Dictionary
Jack Halpern (Hg.): New ]apanese-English Character Dictionary. WT~~t~~~. Tokyo: Kenkyusha, 1990. 226, 1992 S. Besprochen von Jürgen 5talph Das lange vorbereitete, von vielen Seiten geförderte, in tausend Vorträgen besungene, das unermüdlich vorangekündigte Kanjilexikon vonJack Hal pern, es ist da. Massiv, stattlich, gelb-blau und: überflüssig. Aber der Reihe nach. Was wurde uns, was wurde allen, die sich mit der japanischen Sprache befassen oder befassen wollen, versprochen, und was verspricht der Schutzumschlag noch immer? "A major event in Japa nese language studies. A significant step forward in the methods, tools, and concepts used in compiling character dictionaries. The first treatment of kanji compiled by computational lexicography on the basis of sound linguistic principles". Weiter heißt es: This is the first kanji dictionary edited and produced entirely by com puter. The project has cost more than US $ 1,600,000 and required some sixty-six man-years for completion over aperiod of sixteen years. Great pains were taken to ensure accuracy and up-to-dateness: some 700 computer programs were written for editing and proofread ing the data, while Japanese language experts in the U.S. and Japan have confirmed its scholarly accuracy and lent their enthusiastic sup port. The dictionary also benefited from the support of various Japa nese government organizations and officials, including a former prime minister. Eins Komma sechs Millionen US-Dollar: welch finanzieller Aufwand! Sechsundsechzig Mannjahre in sechzehn Jahren: welch Aufwand an Zeit und Mühe! Sprachexperten in Nordamerika und Japan, dazu ein japani scher Premier: welch personelles Aufgebot, welch Enthusiasmus! Was Wunder auch: Das Buch bietet ja - laut Schutzumschlag - nichts weniger als eine lexikographische Revolution. -
CALICO Software Review
CALICO Software Review CALICO Journal, Volume 19 Number 2, pp. 390-404 Vocabulary/Kanji/Conjugation Exercise for Japanese (VKC/J2.0) Eiko Ushida - Carnegie Mellon University Product at a glance Product type: Vocabulary practice; kanji (Chinese characters); verb and adjective forms exercises; and an authoring system Language: Japanese Level: Elementary level Activity: Word-level practice/exercises Media format: WWW download Computer platform: Macintosh OS 7+ Hardware MAC: 68030+ requirements RAM: 8Mb 8Mb Harddisk space Supplemental Japanese Language Kit (JLK) or Japanese system is required for software creating own vocabulary databases. HyperCard Player is requirements: downloaded automatically with VKC/J2.0. Price: Freeware General Description Vocabulary/Kanji/Conjugation Exercise for Japanese (VKC/J2.0) is a useful program for practicing vocabulary, kanji and verb/adjective conjugation for elementary level learners of Japanese. It consists of the VKC/J2.0 system, VKC/J2.0 editors, data sets for two major Japanese language textbooks (i.e., Nakama 1 & 2, Situational Functional Japanese) and the VKC database. These are all freeware so that any one can download them from the VKC/J2.0 homepage, which introduces features of the software as well as the instructions. The VKC/J2.0 system includes the program starter and example exercises. The VKC/J2.0 editors include three types of data editors; vocabulary, kanji, and sound. VKC/J2.0 creates a vast number of exercises on vocabulary, kanji and verb/adjective conjugation based on the selected database, allowing students to practice as much as they would like. It also has a quiz function for vocabulary and kanji exercises, where students can test their progress. -
Linguist Expands the Horizons of Language Analysis with New Book 13 March 2017, by Peter Dizikes
Linguist expands the horizons of language analysis with new book 13 March 2017, by Peter Dizikes Many linguistics scholars regard the world's Because English is the native language of so many languages as being fundamentally similar. Yes, the great linguists, he observes, there is a tendency to characters, words, and rules vary. But underneath regard it as a template for other languages. But it all, enough similar structures exist to form what drawing more heavily on additional languages, MIT scholars call universal grammar, a capacity for Miyagawa thinks, could lead to new insights about language that all humans share. the specific contents of our universal language capacity; he cites the work of MIT linguist Norvin To see how linguists find similariites that can elude Richards as an example of this kind of work. the rest of us, consider a language operation called "allocutive agreement." This is a variation of "Given the prominence of Indo-European standard subject-verb agreement. Normally, a verb languages, especially English, in linguistic theory, ending simply agrees with the subject of a one sometimes gets the impression that if sentence, so that in English we say, "You go," but something happens in English it's due to universal also, "She goes." grammar, but if something happens in Japanese, it's because it's Japanese," Miyagawa says. Allocutive agreement throws a twist into this procedure: Even a third-person verb ending, such Not mere formalities as "she goes," changes depending on the social status of the person being spoken to. This To see why allocutive agreement seems like such a happens in Basque, for one. -
A Mathematical Theory of Language
International Journal of Contemporary Education Vol. 1, No. 1; April 2018 ISSN 2575-3177 E-ISSN 2575-3185 Published by Redfame Publishing URL: http://ijce.redfame.com A Mathematical Theory of Language Hans Gua Correspondence: Hans Gua, Yeda Village, Xiyu Town, Xihe County, Longnan, Gansu 742100, China. Received: November 8, 2017 Accepted: December 24, 2017 Online Published: December 27, 2017 doi:10.11114/ijce.v1i1.2893 URL: https://doi.org/10.11114/ijce.v1i1.2893 Abstract Modern linguistics cannot define and identify the best or standard pronunciations, writing and grammar. The choice and decision of sole human unified standard official or common language cannot be solved by modern linguistics generally. The current linguistics is no longer met the human developments because it can’t answer what the best language is. The sole unified standard official global English cannot appear because of English linguistic level and shortcoming mainly. English linguistics comes to predominate in the contemporary era. Negating and improving the current linguistics must be negated and improved English linguistics first. The finite numerals are expressed infinite quantities in the mathematics. The finite sounds are represented the infinite meanings in the language. The theory and method are almost same in the mathematics and linguistics generally. The linguistics is a branch or concrete application of information theory (IT). IT is based on the probability theory and statistics generally. A meaning is often certain code, string of codes or mathematical value in the language. Defining or explaining certain meaning of language is measured and calculated a mathematical value actually. A subsystem or subtopic such as language teaching is often based on the general linguistics.