Extraction of Neologisms from Japanese Corpora

Total Page:16

File Type:pdf, Size:1020Kb

Extraction of Neologisms from Japanese Corpora Extraction of Neologisms from Japanese Corpora A thesis presented by James Breen to The School of Computing and Information Systems in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Melbourne Melbourne, Australia December 2017 ©2017 - James Breen All rights reserved. Thesis advisers Author Timothy Baldwin James Breen Francis Bond Extraction of Neologisms from Japanese Corpora Abstract In this thesis an exploration of the application of natural-language processing techniques to the extraction of neologisms from Japanese corpora is described. The research aim was to establish techniques which can be developed and exploited to assist significantly in neologism extraction for compiling Japanese monolingual and bilingual dictionaries. The particular challenge of the task is presented by the lack of word boundaries in Japanese text which creates a problem in the identification of unrecorded words. Three broad approaches have been explored, using a variety of language processing and artificial intelligence techniques, and drawing on large-scale Japanese corpora and reference lexicons: synthesis of possible Japanese words by mimicking Japanese morphological processes, followed by testing for the presence of candidate words in Japanese corpora; analysis of morpheme sequences in Japanese texts to determine the presence of potential new or unrecorded terms; and analysis of language patterns which are often used in Japanese in association with new and emerging terms. The research described in this thesis has identified a number of processes which Abstract iv can be used to assist lexicographers in the identification of unrecorded lexical items in Japanese texts. Contents Title Page . i Abstract . iii Table of Contents . v Abbreviations used in this Thesis . x Citations to Previously Published Work . xii Acknowledgments . xiii 1 Introduction 1 1.1 Thesis Overview . 1 1.2 Background, Relevance and Importance of the Project . 3 1.2.1 Japanese Orthography . 3 1.2.2 Neologism Formation in Japanese . 4 1.2.3 Project Goals . 8 1.3 Conceptual Framework of the Study, Summary of Experimental Methods 9 1.4 Lexicographic Background . 12 1.5 Thesis Structure and Timeline . 14 2 Neologisms: Lexicographic Issues and Terminology 16 2.1 Introduction . 16 2.2 Nomenclature . 19 2.3 What Makes Up A Lexical Item? . 23 2.4 The Japanese Perspective . 26 2.5 Single Items . 26 2.6 Multiword Expressions . 29 2.7 Potential Lexical Items . 33 2.8 Summary . 40 3 Prior Work and Literature Review 41 3.1 Introduction . 41 3.2 General Lexicography . 41 3.3 Japanese NLP - Morphological Analyzers and Parsers . 43 v Contents vi 3.4 Alternative Segmentation Approach . 43 3.5 Identification of Unknown Words . 44 3.6 Pre-candidature Work . 46 4 Resources 49 4.1 Introduction . 49 4.2 Dictionaries and General Lexicons . 50 4.3 Text Corpora . 53 4.4 n-gram Corpora . 55 4.5 Software . 58 4.5.1 Morphological Analyzers . 58 4.5.2 Morpheme Lexicons . 60 4.5.3 Machine-Learning Systems . 62 5 Lexical Item Identification From Morpheme Analysis 64 5.1 Introduction . 64 5.2 Morphological Analysis of Japanese Text . 65 5.3 Prior Work . 69 5.4 Using Unknown Word Functions in Morphological Analyzers . 71 5.5 Rule-Based Lexical Item Identification . 75 5.6 Application of Machine Learning to Lexical Item Extraction . 79 5.6.1 Basic Model . 79 5.6.2 Labelling . 80 5.6.3 Feature Development . 82 5.7 Evaluation . 86 5.7.1 Overview . 86 5.7.2 Testing with Automatically Marked-up Texts . 87 5.7.3 Initial Testing . 87 5.7.4 Testing with Hand-Annotated Texts . 91 5.7.5 Precision Issues . 99 5.7.6 Impact of Training Texts . 100 5.7.7 Impact of Training Size . 101 5.8 Testing for Potential Lexical Items in Unseen Texts . 103 5.9 Summary, Discussion and Conclusions . 107 5.10 Postscript: Alternative Chunking Approach . 109 6 Japanese Loanword Multi-Word Expressions: Extraction, Segmen- tation and Translation 111 6.1 Introduction . 111 6.2 Orthographical Aspects of Loanwords in Japanese . 112 6.3 Assimilation of Loanwords in Japanese . 114 6.4 Loanword Multi-Word Expressions . 115 Contents vii 6.5 Prior Work . 115 6.5.1 Segmentation . 116 6.5.2 Non-English Words . 117 6.5.3 Pseudo-English Constructions . 118 6.5.4 Orthographical Variants . 118 6.5.5 Polysemy in Loanwords . 119 6.6 Extraction of Loanwords from Japanese Text . 119 6.7 Segmentation and MWE Translation . 120 6.8 Evaluation . 125 6.8.1 Segmentation . 125 6.8.2 Translation . 127 6.9 Summary, Discussion and Possible Improvements . 132 6.9.1 Online CLST Interface . 134 7 Neologism Synthesis 135 7.1 Introduction . 135 7.2 Prior Work . 136 7.3 Approaches to Neologism Synthesis . 136 7.4 Resources . 138 7.5 Evaluation of Synthesized Compounds . 140 7.5.1 Initial Investigations . 140 7.5.2 Classification of Synthesized Compounds . .143 7.5.3 Initial Testing . 145 7.5.4 Initial Testing - Results . 147 7.5.5 Initial Testing - Discussion . 149 7.6 Morpheme-based Abbreviation Construction . 150 7.6.1 Background . 150 7.6.2 Initial Testing . 151 7.6.3 Extended testing . 153 7.6.4 Discussion . 154 7.7 Affixation . 155 7.8 Generalized Compound Creation - 2-kanji Compounds . 156 7.8.1 Introduction . 156 7.8.2 Initial Tests . 156 7.8.3 Extended Investigations . 157 7.8.4 Discussion of 2-kanji Synthesis and Evaluation . 171 7.9 Generalized Compound Creation - 4-kanji Compounds . 172 7.9.1 Introduction . 172 7.9.2 Compound Synthesis . 172 7.9.3 Initial Investigation . 174 7.9.4 Development of Training Data . 178 7.9.5 Testing . 180 Contents viii 7.9.6 Discussion of 4-kanji Synthesis and Evaluation . 182 7.10 Summary, Discussion and Conclusions . 185 7.10.1 Future Work . 186 8 Generation and Extraction of Compound Verbs 188 8.1 Introduction . 188 8.2 Overview of Japanese Compound Verbs . 189 8.3 Prior Work . 191 8.4 General Approach and Resources . 191 8.4.1 Approaches . 191 8.4.2 Resources Used . 193 8.4.3 Synthesis of Compound Verbs . 193 8.4.4 Direct n-gram Search . 195 8.5 Analysis of the Potential Compound Verbs . 198 8.6 Productivity Measures of V1 and V2 Components . 200 8.7 Summary, Conclusions and Future Work . 206 9 Neologism Identification through Language Contexts 208 9.1 Introduction . 208 9.2 Prior Work . 209 9.3 Text Corpora . 211 9.4 Initial Exploration . 212.
Recommended publications
  • Jim Breen's Japanese Page
    Jim Breen’s Japanese Page Bill Gordon November 1, 2000 1 Jim Breen’s Japanese Page (http://www.csse.monash.edu.au/~jwb/japanese.html) offers an abundance of resources related to Japanese language and culture. The web site’s main features include an online Japanese-English dictionary server, a frontend interface to the dictionary to translate Japanese text in web pages, an FTP archive of software and files related to primarily the Japanese language, an extensive listing of links to other web sites, and a gallery of ukiyoe prints (Japanese multicolored wood-block prints). This essay considers the effectiveness of Jim Breen’s Japanese Page and his electronic dictionary by evaluating them against standard criteria for web resources and by performing a comparison with other print and electronic resources. The first section compares the features of his electronic dictionary and translation software with print dictionaries, portable electronic dictionaries, and other web dictionaries and translation software. The second part of this essay evaluates his web site by using standard criteria such as authority, scope, organization, content, and reception by target audience. The final section discusses some unique aspects of using the web for the types of resources available on Jim Breen’s Japanese Page. 1. Comparison With Other Resources Japanese-English dictionaries in print have certain useful features that have not yet been reproduced on the web. In addition to print dictionaries, since the early 1990s electronic dictionaries have gained popularity very rapidly because of their speed, portability, and unique features. Besides Breen’s dictionary and web page translation aid, other companies and individuals offer online Japanese-English dictionaries and web page translation.
    [Show full text]
  • The Orthographic Characterization of Rendaku and Lyman's
    a journal of Kawahara, Shigeto. 2018. Phonology and orthography: The orthographic general linguistics Glossa characterization of rendaku and Lyman’s Law. Glossa: a journal of general linguistics 3(1): 10. 1–24, DOI: https://doi.org/10.5334/gjgl.368 RESEARCH Phonology and orthography: The orthographic characterization of rendaku and Lyman’s Law Shigeto Kawahara The Keio Institute of Cultural and Linguistic Studies, Keio University, 2-15-45 Minato-ku, Mita, Tokyo, JP [email protected] This paper argues that phonology and orthography go in tandem with each other to shape our phonological behavior. More concretely, phonological operations are non-trivially affected by orthography, and phonological constraints can refer to them. The specific case study comes from a morphophonological alternation in Japanese, rendaku. Rendaku is a process by which the first consonant of the second member of a compound becomes voiced (e.g., /oo/ + /tako/ → [oo+dako] ‘big octopus’). Lyman’s Law blocks rendaku when the second member already contains a voiced obstruent (/oo/ + /tokage/ → *[oo+dokage], [oo+tokage] ‘big lizard’). Lyman’s Law, as a constraint which prohibits a morpheme with two voiced obstruents, is also known to trigger devoicing of geminates in loanwords (e.g. /beddo/ → [betto] ‘bed’). Rendaku and Lyman’s Law have been extensively studied in the past phonological literature. Inspired by recent work that shows the interplay between orthographic factors and grammatical factors in shaping our phonological behaviors, this paper proposes that rendaku and Lyman’s Law actually operate on Japanese orthography. Rendaku is a process that assigns dakuten diacritics, and Lyman’s Law prohibits morphemes with two diacritics.
    [Show full text]
  • The Japanese Writing Systems, Script Reforms and the Eradication of the Kanji Writing System: Native Speakers’ Views Lovisa Österman
    The Japanese writing systems, script reforms and the eradication of the Kanji writing system: native speakers’ views Lovisa Österman Lund University, Centre for Languages and Literature Bachelor’s Thesis Japanese B.A. Course (JAPK11 Spring term 2018) Supervisor: Shinichiro Ishihara Abstract This study aims to deduce what Japanese native speakers think of the Japanese writing systems, and in particular what native speakers’ opinions are concerning Kanji, the logographic writing system which consists of Chinese characters. The Japanese written language has something that most languages do not; namely a total of ​ ​ three writing systems. First, there is the Kana writing system, which consists of the two syllabaries: Hiragana and Katakana. The two syllabaries essentially figure the same way, but are used for different purposes. Secondly, there is the Rōmaji writing system, which is Japanese written using latin letters. And finally, there is the Kanji writing system. Learning this is often at first an exhausting task, because not only must one learn the two phonematic writing systems (Hiragana and Katakana), but to be able to properly read and write in Japanese, one should also learn how to read and write a great amount of logographic signs; namely the Kanji. For example, to be able to read and understand books or newspaper without using any aiding tools such as dictionaries, one would need to have learned the 2136 Jōyō Kanji (regular-use Chinese characters). With the twentieth century’s progress in technology, comparing with twenty years ago, in this day and age one could probably theoretically get by alright without knowing how to write Kanji by hand, seeing as we are writing less and less by hand and more by technological devices.
    [Show full text]
  • How to Learn Japanese Simon Reynolds How to Learn Japanese
    How to Learn Japanese Simon Reynolds How to Learn Japanese Copyright 2007 by Simon Reynolds All Rights Reserved. No part of this book may be used, reproduced or transmitted in any manner whatsoever—electronic or mechanical, including photocopying, recording or by any system of storing and retrieving information—without written permission from the publisher, except for brief quotations embodied in reviews. Email: [email protected] Website: http://sprstrikesback.googlepages.com/home Manufactured in the U.K. First Edition: 2007 Book and cover design by Simon Reynolds and Yuka Reynolds Visit our website! How to Learn Japanese Simon Reynolds TABLE OF CONTENTS 1. WHY LEARN JAPANESE? 4 2. LEARNING TO LEARN 5 Where to start Should I learn to read and write Japanese? Approaches to learning 6 Finding a teacher Language schools Language exchange 7 Self-study Self study tips Building vocabulary 8 Learning grammar Listening 9 What did you say? Speaking 10 Confidence Less is more Tips on starting a conversation Get out of jail free 11 Troubleshooting Slang Practice Writing 3. PERFECTING PRONUNCIATION 13 Vowel sounds Intonation Thinking in syllables Small tsu Dots and circles Combined syllables 14 Su Ha and he Common mistakes Homonyms 15 Pronunciation practice 4. WRITING RIGHT 17 Stroke order Learning the kana Flashcards Installing Japanese fonts on your computer Learning Kanji 18 How many kanji do I need? Approaches to learning kanji Component analysis AKA the fast track Using the internet 19 Learning the pronunciations Kanji town 20 Kanji game Buying a kanji dictionary Starting to read Visit our website! How to Learn Japanese Simon Reynolds Audio books 21 More reading on the web Japanese tests JLPT J-test 22 Kanji test 5.
    [Show full text]
  • Unconscious Gairaigo Bias in EFL: a Case Study of Japanese Teachers of English
    Unconscious Gairaigo Bias in EFL: A Case Study of Japanese Teachers of English Mark Spring Keywords: gairaigo, loanwords, cross-linguistic transfer, bias, traditional teaching 1. Introduction 'Japanese and English speakers find each other's languages hard to learn' (Swan and Smith, 2001: 296). This is probably due in no small part to the many linguistic differences between the seemingly unrelated languages. Huge levels of word borrowing, though, have led to an abundance of loanwords in the current Japanese lexicon, many originating from English. These are known locally in Japan as gairaigo. Indeed, around half of the three thousand most common words in English and around a quarter of those on The Academic Word List (see Coxhead, 2000) correspond in some form to gairaigo (Daulton: 2008: 86). Thus, to some degree we can say that the two languages are 'lexically wed' (ibid: 40). With increasing recognition among researchers of the positive role that the first language plays in the learning of a second, and a growing number of empirical studies indicating gairaigo knowledge can facilitate English acquisition, there have been calls to exploit these loanwords for the benefit of Japanese learners of English. But despite research supporting a role for them in learning English, it is said that 'many or most Japanese teachers of English avoid using gairaigo in the classroom' (Daulton: 2011: 8) due to a 'gairaigo bias' (ibid.). This may stem from unfavourable social attitudes towards loanwords themselves in the Japanese language, and pedagogical concerns over their negative influences on learning. Should this be true, it would represent a position incongruous with the idea of exploiting cross-linguistic lexical similarities.
    [Show full text]
  • Sociophonetic Variation at the Intersection of Gender, Region, and Style in Japanese Female Speech
    SOCIOPHONETIC VARIATION AT THE INTERSECTION OF GENDER, REGION, AND STYLE IN JAPANESE FEMALE SPEECH A Dissertation submitted to the Faculty of the Graduate School of Arts and Sciences of Georgetown University in partial fulfillment of the requirements of the degree of Doctor of Philosophy in Linguistics By Sakiko Kajino, M.S. Washington, D.C. March 18th, 2014 Copyright 2014 by Sakiko Kajino All Rights Reserved ii SOCIOPHONETIC VARIATION AT THE INTERSECTION OF GENDER, REGION, AND STYLE IN JAPANESE FEMALE SPEECH Sakiko Kajino, M.S. Dissertation Advisors: Natalie Schilling, Ph.D. and Robert J. Podesva, Ph.D. ABSTRACT This dissertation is a sociophonetic study of 46 female Japanese speakers from three major metropolitan regions: Tokyo, Kyoto, and Osaka. While previous work on Japanese Women's Language assumes a monolithic speech variety, this study shows that women in the three regions exhibit strikingly different speech patterns. Rather than constructing a uniform gender identity, Japanese women produce gendered figures that typify particular geographic regions while negotiating the regional stereotypes. Three phonetic features in 25 dyadic conversation recordings of 46 participants are analyzed quantitatively and qualitatively: breathy voice, acoustic characteristics of voiceless sibilant fricatives /s/ (e.g. sumi ‘charcoal’) and /ɕ/ (e.g. shumi ‘hobby’), and intonational patterns (accented vs. deaccented) of negative polar questions (e.g. amaku nai? ‘isn’t [this] sweet?’). The analyses present the cross-regional patterning as well as intra-regional variation using the mixed-method technique with sociolinguistic variationist analysis, close examination of conversations, and ethnographic approach. The cross-regional analyses, which present big-picture patterns for the three phonetic features, show the following: 1) A feature that is considered to mark gender (i.e.
    [Show full text]
  • Quick Reference
    Quick Reference Before Initial Use ■ Insert the batteries Turn off the device and position your thumbs on the arrows on the battery com- 1 partment cover located on the bottom of the device. Push the cover in the direc- tion of the arrows to remove the cover. Insert the two attached AAA size batteries and make sure that their poles 2 (+ and -) are correctly aligned. Quick Reference 3 Install the battery cover again. ■ Reset device 1 Press Reset on the bottom of the device. 2 Open the device cover and adjust the display angle for the best visibility. 248 CSH06E200-P248-259 Page 248 06.11.6, 4:45 PM Adobe PageMaker 6.5J/PPC Quick Reference A message that reads " システムを初期化しますか?/Do you want to reset?". 3 To initialize the system, select " はい " and press the button. A message that reads "タッチスクリーン補正 をスタイラスでタップしてく 4 ださい"(Touch screen calibration. Please tap mark) appears on the display. With the stylus pen, tap the four at the corners and then they will be disappeared. After the display for the battery type setting and for the contrast adjustment 5 setting, the menu is displayed, and the device is ready for use. Quick Reference 249 CSH06E200-P248-259 Page 249 06.11.6, 4:45 PM Adobe PageMaker 6.5J/PPC Quick Reference Key Functions 1~2 15~19 1 Menu Key 3 20 2 Multiple Search Key • On/Off Key 3 • Press and then to use back light function. 4 21~22 4 Shift Key 14 5~10 • Bookmark words/phrases in main text screen.
    [Show full text]
  • The Functions and Evolution of Topic and Focus Markers
    The Functions and Evolution of Topic and Focus Markers by Paula Kadose Radetzky B.A. (Columbia University) 1991 M.A. (University of California, Berkeley) 1996 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Linguistics in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Richard A. Rhodes, Co-chair Professor Eve E. Sweetser, Co-chair Professor H. Mack Horton Spring 2002 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract The Functions and Evolution of Topic and Focus Markers by Paula Kadose Radetzky Doctor of Philosophy in Linguistics University of California, Berkeley Professors Richard A. Rhodes and Eve E. Sweetser, Co-chairs This dissertation examines the notions of topic and focus from both synchronic and diachronic points of view. Previous works have almost exclusively treated these concepts synchronically, and the historical studies which do exist have not successfully traced and motivated the individual stages of development. The sections on topic first propose and give cross-linguistic evidence for the following path of grammaticalization: locative/ contrastive topic marker > marker > marker This overview is followed by two text-based studies, one of the Japanese topic marker wa and the other of the Greek particle de. Because of their long written traditions, these two languages allow us to contextually view and motivate the intermediate stages of grammaticalization. The last part of the dissertation is a discussion of focus. It begins by developing a synchronic theory involving different levels of highlighting, and then it presents case studies of data primarily from Japanese and Korean, examining in detail the mechanisms by which demonstratives and copulas become focus markers in these languages.
    [Show full text]
  • A Study of Loan Color Terms Collocation in Modern Japanese
    A Study of Loan Color Terms Collocation in Modern Japanese Anna V. Bordilovskaya ([email protected]) Graduate School of Humanities, Kobe University, 1-1 Rokkodai-machi, Nada-ku, Kobe 657-8501 JAPAN Abstract English loanwords in Japanese have been a topic of various studies by both native and foreign linguists for The Japanese lexicon consists of Japanese-origin words about 100 years. (WAGO), Chinese-origin words (KANGO) and words Some researchers are more interested in the assimilation borrowed from English and other European languages processes of loanwords (Kay, 1995; Irwin, 2011), other (GAIRAIGO). The acquisition of words from three sources linguists focus on semantic changes (Daulton, 2008), third results in the abundance of near synonyms without any clear rules when a particular synonym should be used. mainly study sociolinguistic background and functions Loveday has hypothesized that WAGO/KANGO and (Loveday, 1986, 1996). GAIRAIGO concrete nouns are used to address similar At present, the number of GAIRAIGO is increasing phenomena of Japanese and Western origins, respectively. rapidly and loanwords penetrate into different spheres of This is referred as Hypothesis of Foreign vs. Native life. Dictionaries (Katakanago Jiten Consaizu (The Concise Dichotomy (HFND). However, the matter of abstract nouns, Dictionary of Katakana Words), etc.) in most cases do not adjectivals and their collocations remains unstudied. In contrast to the previous studies, based on questionnaires, our state any clear differences in the meaning and usage for the approach stems from statistical analysis of corpus data. Our abovementioned near synonyms. results illuminate a distinguishable bias in the structure of On the other hand, the experience of studying and collocations – nouns and adjectivals of the same origin tend communicating in Japanese shows that it is not possible to to appear together more often than the ones of the different substitute WAGO/KANGO and GAIRAIGO near synonyms origins.
    [Show full text]
  • Affectedness Constructions: How Languages Indicate Positive and Negative Events
    Affectedness Constructions: How languages indicate positive and negative events by Tomoko Yamashita Smith B.S. (Doshisha Women’s College) 1987 M.A. (San Jose State University) 1995 M.A. (University of California, Berkeley) 1997 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Linguistics in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Eve E. Sweetser Professor Soteria Svorou Professor Charles J. Fillmore Professor Yoko Hasegawa Fall 2005 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Affectedness Constructions: How languages indicate positive and negative events Copyright 2005 by Tomoko Yamashita Smith Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract Affectedness constructions: How languages indicate positive and negative events by Tomoko Yamashita Smith Doctor of Philosophy in Linguistics University of California, Berkeley Professor Eve E. Sweetser, Chair This dissertation is a cross-linguistic study of what I call “affectedness constructions” (ACs) that express the notions of benefit and adversity. Since there is little research dealing with both benefactives and adversatives at the same time, the main goal of this dissertation is to establish AC as a grammatical category. First, many instances of ACs in the world are provided to show both the diversity of ACs and the consistent patterns among them. In some languages, a single construction indicates either benefit or adversity, depending on the context, while in others there is one or more individual benefactive and/or adversative construction(s). Since the event types that ACs indicate appear limited, I categorize the constructions by event type and discuss Reproduced with permission of the copyright owner.
    [Show full text]
  • Introduction to Japanese Computational Linguistics Francis Bond and Timothy Baldwin
    1 Introduction to Japanese Computational Linguistics Francis Bond and Timothy Baldwin The purpose of this chapter is to provide a brief introduction to the Japanese language, and natural language processing (NLP) research on Japanese. For a more complete but accessible description of the Japanese language, we refer the reader to Shibatani (1990), Backhouse (1993), Tsujimura (2006), Yamaguchi (2007), and Iwasaki (2013). 1 A Basic Introduction to the Japanese Language Japanese is the official language of Japan, and belongs to the Japanese language family (Gordon, Jr., 2005).1 The first-language speaker pop- ulation of Japanese is around 120 million, based almost exclusively in Japan. The official version of Japanese, e.g. used in official settings andby the media, is called hyōjuNgo “standard language”, but Japanese also has a large number of distinctive regional dialects. Other than lexical distinctions, common features distinguishing Japanese dialects are case markers, discourse connectives and verb endings (Kokuritsu Kokugo Kenkyujyo, 1989–2006). 1There are a number of other languages in the Japanese language family of Ryukyuan type, spoken in the islands of Okinawa. Other languages native to Japan are Ainu (an isolated language spoken in northern Japan, and now almost extinct: Shibatani (1990)) and Japanese Sign Language. Readings in Japanese Natural Language Processing. Francis Bond, Timothy Baldwin, Kentaro Inui, Shun Ishizaki, Hiroshi Nakagawa and Akira Shimazu (eds.). Copyright © 2016, CSLI Publications. 1 Preview 2 / Francis Bond and Timothy Baldwin 2 The Sound System Japanese has a relatively simple sound system, made up of 5 vowel phonemes (/a/,2 /i/, /u/, /e/ and /o/), 9 unvoiced consonant phonemes (/k/, /s/,3 /t/,4 /n/, /h/,5 /m/, /j/, /ó/ and /w/), 4 voiced conso- nants (/g/, /z/,6 /d/ 7 and /b/), and one semi-voiced consonant (/p/).
    [Show full text]
  • General Explanations Nihon Kokugo Daijiten Editorial Policy 1
    General Explanations Nihon kokugo daijiten Editorial Policy 1. This dictionary attempts to offer a historical account of the meanings and usages of the Japanese language through reference to various written materials. 2. Entry items include the vocabulary of modern Japanese as well as items from historical texts. Proper nouns, including names of places and people, and technical and specialized terms are also included. 3. Definitions of a given word are generally arranged in historical order; usage citations are accompanied by the name of the text in which they appear. 4. Sources for citations are taken from a broad spectrum of texts including literary, historical, religious, and other works of various periods. 5. Citation sources range from works of antiquity to works of the Meiji, Taishō, and Shōwa periods. Chinese texts are also used for Sino-Japanese words. 6. Citations are taken from the most reliable versions of historical texts; in cases where variant texts are cited, notice is given. 7. Identification of citations is a specific as possible. In order to facilitate comprehension, some citations include the author's name and the field with which the text is associated. 8. Separate subheadings for Dialectal Variants, Etymology, Pronunciation, and Premodern Dictionary Citations are included with commentary where appropriate. 9. Entry headings and definitions are based on modern standards, and are intended to make location and comprehension as easy as possible. Components of the Descriptions The descriptions of words in this dictionary are composed of the following elements: entry heading, historical kana orthography, kanji, part of speech, definitions, examples and sources, supplementary notes, dialectal variants, etymology, pronunciation, and premodern dictionary citations.
    [Show full text]