Generating Diverse Katakana Variants Based on Phonemic Mapping
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Writing As Aesthetic in Modern and Contemporary Japanese-Language Literature
At the Intersection of Script and Literature: Writing as Aesthetic in Modern and Contemporary Japanese-language Literature Christopher J Lowy A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2021 Reading Committee: Edward Mack, Chair Davinder Bhowmik Zev Handel Jeffrey Todd Knight Program Authorized to Offer Degree: Asian Languages and Literature ©Copyright 2021 Christopher J Lowy University of Washington Abstract At the Intersection of Script and Literature: Writing as Aesthetic in Modern and Contemporary Japanese-language Literature Christopher J Lowy Chair of the Supervisory Committee: Edward Mack Department of Asian Languages and Literature This dissertation examines the dynamic relationship between written language and literary fiction in modern and contemporary Japanese-language literature. I analyze how script and narration come together to function as a site of expression, and how they connect to questions of visuality, textuality, and materiality. Informed by work from the field of textual humanities, my project brings together new philological approaches to visual aspects of text in literature written in the Japanese script. Because research in English on the visual textuality of Japanese-language literature is scant, my work serves as a fundamental first-step in creating a new area of critical interest by establishing key terms and a general theoretical framework from which to approach the topic. Chapter One establishes the scope of my project and the vocabulary necessary for an analysis of script relative to narrative content; Chapter Two looks at one author’s relationship with written language; and Chapters Three and Four apply the concepts explored in Chapter One to a variety of modern and contemporary literary texts where script plays a central role. -
Legacy Character Sets & Encodings
Legacy & Not-So-Legacy Character Sets & Encodings Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/unicode/iuc15-tb1-slides.pdf Tutorial Overview dc • What is a character set? What is an encoding? • How are character sets and encodings different? • Legacy character sets. • Non-legacy character sets. • Legacy encodings. • How does Unicode fit it? • Code conversion issues. • Disclaimer: The focus of this tutorial is primarily on Asian (CJKV) issues, which tend to be complex from a character set and encoding standpoint. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations dc • GB (China) — Stands for “Guo Biao” (国标 guóbiâo ). — Short for “Guojia Biaozhun” (国家标准 guójiâ biâozhün). — Means “National Standard.” • GB/T (China) — “T” stands for “Tui” (推 tuî ). — Short for “Tuijian” (推荐 tuîjiàn ). — “T” means “Recommended.” • CNS (Taiwan) — 中國國家標準 ( zhôngguó guójiâ biâozhün) in Chinese. — Abbreviation for “Chinese National Standard.” 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • GCCS (Hong Kong) — Abbreviation for “Government Chinese Character Set.” • JIS (Japan) — 日本工業規格 ( nihon kôgyô kikaku) in Japanese. — Abbreviation for “Japanese Industrial Standard.” — 〄 • KS (Korea) — 한국 공업 규격 (韓國工業規格 hangug gongeob gyugyeog) in Korean. — Abbreviation for “Korean Standard.” — ㉿ — Designation change from “C” to “X” on August 20, 1997. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • TCVN (Vietnam) — Tiu Chun Vit Nam in Vietnamese. — Means “Vietnamese Standard.” • CJKV — Chinese, Japanese, Korean, and Vietnamese. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated What Is A Character Set? dc • A collection of characters that are intended to be used together to create meaningful text. -
Implementing Cross-Locale CJKV Code Conversion
Implementing Cross-Locale CJKV Code Conversion Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-paper.pdf ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/unicode/iuc13-c2-slides.pdf Code Conversion Basics dc • Algorithmic code conversion — Within a single locale: Shift-JIS, EUC-JP, and ISO-2022-JP — A purely mathematical process • Table-driven code conversion — Required across locales: Chinese ↔ Japanese — Required when dealing with Unicode — Mapping tables are required — Can sometimes be faster than algorithmic code conversion— depends on the implementation September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Code Conversion Basics (Cont’d) dc • CJKV character set differences — Different number of characters — Different ordering of characters — Different characters September 10, 1998 Copyright © 1998 Adobe Systems Incorporated Character Sets Versus Encodings dc • Common CJKV character set standards — China: GB 1988-89, GB 2312-80; GB 1988-89, GBK — Taiwan: ASCII, Big Five; CNS 5205-1989, CNS 11643-1992 — Hong Kong: ASCII, Big Five with Hong Kong extension — Japan: JIS X 0201-1997, JIS X 0208:1997, JIS X 0212-1990 — South Korea: KS X 1003:1993, KS X 1001:1992, KS X 1002:1991 — North Korea: ASCII (?), KPS 9566-97 — Vietnam: TCVN 5712:1993, TCVN 5773:1993, TCVN 6056:1995 • Common CJKV encodings — Locale-independent: EUC-*, ISO-2022-* — Locale-specific: GBK, Big Five, Big Five Plus, Shift-JIS, Johab, Unified Hangul Code — Other: UCS-2, UCS-4, UTF-7, UTF-8, -
JFP Reference Manual 5 : Standards, Environments, and Macros
JFP Reference Manual 5 : Standards, Environments, and Macros Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 817–0648–10 December 2002 Copyright 2002 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, and Solaris are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements. -
Implementing Cross-Locale CJKV Code Conversion
Implementing Cross-locale CJKV Code Conversion Ken Lunde, Adobe Systems Incorporated [email protected] http://www.oreilly.com/~lunde/ 1. Introduction Most operating systems today deal with single locales. Within a single CJKV locale, different operating sys- tems often use different encodings for the same character set. Consider Shift-JIS and EUC-JP encodings for Japanese—Shift-JIS is historically used on MacOS and Windows, but EUC-JP is used on Unix. This makes code conversion a necessity. Code conversion within a single locale is, by and large, a trivial operation that typically involves a mathematical algorithm. In the past, a lot of code conversion was performed by users through dedicated software tools. Many of today’s applications include built-in code conversion routines, but these routines deal with only multiple encodings of a single locale (such as EUC-KR, ISO-2022-KR, Johab, and Unified hangul Code encodings for Korean). Code conversion across CJKV locales, such as between Chinese and Japanese, is more problematic. While Unicode serves as an excellent basis for implementing cross-locale code conversion, there are still problems to be addressed, such as unmappable characters. 2. Code Conversion Basics Converting between different encodings of a single locale, which represents a trivial effort that involves well- established code conversion algorithms (or mapping tables), is a well-understood process these days. How- ever, as soon as code conversion extends beyond a single locale, there are additional complexities that arise, such as the following: • Code conversion algorithms almost always must be replaced by mapping tables because the ordering of characters in different CJKV character sets are different. -
Sveučilište Josipa Jurja Strossmayera U Osijeku Filozofski Fakultet U Osijeku Odsjek Za Engleski Jezik I Književnost Uroš Ba
CORE Metadata, citation and similar papers at core.ac.uk Provided by Croatian Digital Thesis Repository Sveučilište Josipa Jurja Strossmayera u Osijeku Filozofski fakultet u Osijeku Odsjek za engleski jezik i književnost Uroš Barjaktarević Japanese-English Language Contact / Japansko-engleski jezični kontakt Diplomski rad Kolegij: Engleski jezik u kontaktu Mentor: doc. dr. sc. Dubravka Vidaković Erdeljić Osijek, 2015. 1 Summary JAPANESE-ENGLISH LANGUAGE CONTACT The paper examines the language contact between Japanese and English. The first section of the paper defines language contact and the most common contact-induced language phenomena with an emphasis on linguistic borrowing as the dominant contact-induced phenomenon. The classification of linguistic borrowing thereby follows Haugen's distinction between morphemic importation and substitution. The second section of the paper presents the features of the Japanese language in terms of origin, phonology, syntax, morphology, and writing. The third section looks at the history of language contact of the Japanese with the Europeans, starting with the Portuguese and Spaniards, followed by the Dutch, and finally the English. The same section examines three different borrowing routes from English, and contact-induced language phenomena other than linguistic borrowing – bilingualism , code alternation, code-switching, negotiation, and language shift – present in Japanese-English language contact to varying degrees. This section also includes a survey of the motivation and reasons for borrowing from English, as well as the attitudes of native Japanese speakers to these borrowings. The fourth and the central section of the paper looks at the phenomenon of linguistic borrowing, its scope and the various adaptations that occur upon morphemic importation on the phonological, morphological, orthographic, semantic and syntactic levels. -
Hiragana and Katakana Worksheets Free
201608 Hiragana and Katakana worksheets ひらがな カタカナ 1. Three types of letters ··········································· 1 _ 2. Roma-ji, Hiragana and Katakana ·························· 2 3. Hiragana worksheets and quizzes ························ 3-9 4. The rules in Hiragana···································· 10-11 5. The rules in Katakana ········································ 12 6. Katakana worksheets and quizzes ··················· 12-22 Japanese Language School, Tokyo, Japan Meguro Language Center TEL.: 03-3493-3727 Email: [email protected] http://www.mlcjapanese.co.jp Meguro Language Center There are three types of letters in Japanese. 1. Hiragana (phonetic sounds) are basically used for particles, words and parts of words. 2. Katakana (phonetic sounds) are basically used for foreign/loan words. 3. Kanji (Chinese characters) are used for the stem of words and convey the meaning as well as sound. Hiragana is basically used to express 46 different sounds used in the Japanese language. We suggest you start learning Hiragana, then Katakana and then Kanji. If you learn Hiragana first, it will be easier to learn Katakana next. Hiragana will help you learn Japanese pronunciation properly, read Japanese beginners' textbooks and write sentences in Japanese. Japanese will become a lot easier to study after having learned Hiragana. Also, as you will be able to write sentences in Japanese, you will be able to write E-mails in Hiragana. Katakana will help you read Japanese menus at restaurants. Hiragana and Katakana will be a good help to your Japanese study and confortable living in Japan. To master Hiragana, it is important to practice writing Hiragana. Revision is also very important - please go over what you have learned several times. -
Discriminative Method for Japanese Kana-Kanji Input Method
Discriminative Method for Japanese Kana-Kanji Input Method Hiroyuki Tokunaga Daisuke Okanohara Shinsuke Mori Preferred Infrastructure Inc. Kyoto University 2-40-1 Hongo, Bunkyo-ku, Tokyo Yoshidahonmachi, Sakyo-ku, Kyoto 113-0033, Japan 606-8501, Japan tkng,hillbig @preferred.jp [email protected] { } Abstract Early kana-kanji conversion systems employed heuristic rules, such as maximum-length-word The most popular type of input method matching. in Japan is kana-kanji conversion, conver- With the growth of statistical natural language sion from a string of kana to a mixed kanji- processing, data-driven methods were applied for kana string. kana-kanji conversion. Most existing studies on However there is no study using discrim- kana-kanji conversion have used probability mod- inative methods like structured SVMs for els, especially generative models. Although gen- kana-kanji conversion. One of the reasons erative models have some advantages, a number is that learning a discriminative model of studies on natural language tasks have shown from a large data set is often intractable. that discriminative models tend to overcome gen- However, due to progress of recent re- erative models with respect to accuracy. searches, large scale learning of discrim- However, there have been no studies using only inative models become feasible in these a discriminative model for kana-kanji conversion. days. One reason for this is that learning a discriminative model from a large data set is often intractable. In the present paper, we investigate However, due to progress in recent research on whether discriminative methods such as machine learning, large-scale learning of discrim- structured SVMs can improve the accu- inative models has become feasible. -
Katakana Chart
かたかな Katakana Chart W R Y M H N T S K VOWEL ン ワ ラ ヤ マ ハ ナ タ サ カ ア A リ ミ ヒ ニ チ シ キ イ I ル ユ ム フ ヌ ツ ス ク ウ U レ メ ヘ ネ テ セ ケ エ E ヲ ロ ヨ モ ホ ノ ト ソ コ オ O © 2010 Michael L. Kluemper et al. Beginning Japanese, Tuttle Publishing, an imprint of Periplus Editions (HK) Ltd. All rights reserved. www.TimeForJapanese.com. 22 Beginning Japanese 名前: ________________________ 1-1 Katakana Activity Book 日付: ___月 ___日 一、 Practice: アイウエオ カキクケコ ガギグゲゴ O E U I A オ エ ウ イ ア ア オ エ ウ イ ア オ ウ ア エ イ ア オ エ ウ イ オ ウ イ ア オ エ ア KO KE KU KI KA コ ケ ク キ カ カ コ ケ ク キ カ コ ケ ク ク キ カ カ コ キ キ カ コ コ ケ カ ケ ク ク キ キ コ ケ カ © 2010 Michael L. Kluemper et al. Beginning Japanese, Tuttle Publishing, an imprint of Periplus Editions (HK) Ltd. All rights reserved. www.TimeForJapanese.com. 23 GO GE GU GI GA ゴ ゲ グ ギ ガ ガ ゴ ゲ グ ギ ガ ゴ ゴ ゲ グ グ ギ ギ ガ ガ ゴ ゲ ギ ガ ゴ ゴ ゲ ガ ゲ グ グ ギ ギ ゴ ゲ ガ © 2010 Michael L. Kluemper et al. Beginning Japanese, Tuttle Publishing, an imprint of Periplus Editions (HK) Ltd. All rights reserved. -
Adobe Technical Note #5078: the Adobe-Japan1-6 Character Collection 2
Adobe Enterprise & Developer Support bc Adobe Technical Note #5078 The Adobe-Japan1-6 Character Collection Introduction The purpose of this document is to define and describe the Adobe-Japan1-6 character collection, which enumerates 23,058 glyphs, and whose designation is derived from the following three /CIDSystemInfo dictionary entries: ● /Registry (Adobe) ● /Ordering (Japan1) ● /Supplement 6 CIDFont resources that reference this character collection must include a /CIDSystemInfo dictionary that matches the /Registry and /Ordering strings shown above. This document is designed for font developers, for the purpose of developing Japanese fonts for use with PostScript products, or for developing OpenType Japanese fonts. It is also useful for application developers and end users who need to know more about the glyphs in this character collection. This document expects that its readers are familiar with the CID-keyed font file format, which is described in Adobe Technical Note #5014, entitled Adobe CMap and CIDFont Files Specification.* A character collection contains the glyphs that are required to develop font products for a specific language, script, or market. Specific encodings are defined through the use of CMap resources that are instantiated as files, and generally reference a subset of the character collection. The character collection that results from each Supplement includes the glyphs associated with all earlier Supplements. For example, Supplement 6 includes all glyphs defined in Supplements 0 through 5. The Adobe-Japan1-6 character collection enumerates 23,058 glyphs, specifically CIDs 0 through 23057, among seven Supplements, designated 0 through 6. Adobe-Japan1-6 completely supports the current JIS (Japanese Industrial Standard) character set standards, and earlier vintages thereof, specifically JIS X 0208:1997, JIS X 0213:2004, and JIS X 0212-1990. -
The ALA-LC Japanese Romanization Table
Japanese The ALA-LC Japanese Romanization Table Revision Proposal March 30, 2018 Created by CTP/CJM Joint Working Group on the ALA-LC Japanese Romanization Table (JRTWG) JRTWG is part of: Association of Asian Studies (AAS) Council on East Asian Libraries (CEAL) Committee on Technical Processing (CTP) and Committee on Japanese Materials (CJM) JRTWG Co-Chairs: Yukari Sugiyama (Yale University) Chiharu Watsky (Princeton University) Members of JRTWG: Rob Britt (University of Washington) Ryuta Komaki (Washington University) Fabiano Rocha (University of Toronto) Chiaki Sakai (The Japan Foundation Japanese-Language Institute, Urawa) Keiko Suzuki (The New School) [Served as Co-Chair from April to September 2016] Koji Takeuchi (Library of Congress) Table of Contents Romanization Table…………………………………………………………………………………….Pages 1-28 1. Introduction……………………………………………………………………………………Page 1 2. Basic Principles for Romanization……………………………………………………Page 1 3. Capitalization………………………………………………………………………………….Page 5 4. Japanese Punctuation and Typographical Marks…………………………….Page 8 5. Diacritic Marks and Other Symbols Used in Romanization………………Page 9 6. Word Division………………………………………………………………………………….Page 11 7. Numerals…………………………………………………………………………………………Page 25 Romanized/Kana Equivalent Charts…………………………………………………..…..…….Pages 29-30 Helpful References…………………………………………………………………………………....…Page 31 Last Updated: 3/30/2018 10:48 AM 1 Introduction: Scope of the Romanization Table Romanization is one type of transliteration. Transliteration is the process of converting text written -
Color Coding Japanese Kana for Second Language Acquisition
Color Coding Japanese Kana for Second Language Acquisition Kevin Reay Wrobetz, Kobe Gakuin University, Japan The Asian Conference on Language Learning 2019 Official Conference Proceedings Abstract In this research, the pedagogical effects of endowing Japanese Hiragana with alphabetical properties by phonetically mapping the Kana syllabary to their respective vowel phonemes with a color code as well as mapping diacritical marks color coded to each character’s respective consonant group are studied. This phonetic color coding system mapped to Japanese Kana was tested on a group of native English speakers with no prior knowledge of Japanese, and the results of a series of six tests examining Kana acquisition, pronunciation accuracy, and vocabulary retention were weighed against the results of a control group who received instruction without said color coding system. This phonetic color coding system proves to be far superior to the instructional methods used without the system in three distinct categories. First, the phonetic color coding system, once learned, allows the learner to forego all romanization and instead use only the Kana characters during study thus increasing the speed of Kana acquisition. Second, by not using romanizations to guide pronunciation, the learner is unaffected by the phonetic rules governing the English Latin alphabet thus improving the accuracy of pronunciation. Third, mapping a phonetic color code to a writing system arguably increases the retention rate of vocabulary. The implementation of this phonetic color coding system elicited striking improvements in the abovementioned categories throughout all six tests carried out in this study. Keywords: Japanese as a Foreign Language, Second Language Acquisition, Phonology, Phonetic Color Coding, L2 Vocabulary Acquisition iafor The International Academic Forum www.iafor.org Introduction Japanese is considered to be one of the most difficult languages for native English speakers to acquire proficiency in.