2012・9・16 PM6:00 ※Japanese Text, That Is Excluding Western Text

Total Page:16

File Type:pdf, Size:1020Kb

2012・9・16 PM6:00 ※Japanese Text, That Is Excluding Western Text 「ミレー」 「成吉思汗」 「国語の変遷」 「サン・ペテルスブルグの夜話」 「歴史における個人の役割」 「さらば愛しき女よ」 「昭和経済史」 「銃・病原菌・鉄」 2012・9・16 PM6:00 「本づくりの常識・非常識」 RRR U、R 1) 「いやでも楽しめる算数」 R 「ぎりぎり合格への論文マニュアル」 R,U* Tr,U* 「超・文章法」 「横書き登場」 「5分で楽しむ数学50話」 「戦後日本経済史」 「一冊でつかめる!中国近現代史」 「小説作法ABC」 「文字講座」 「デフレの正体」 10 09 09 09 08 07 03 02 01 01 00 00 86 76 58 48 41 40 39 「刑務所なう」 「さっさと不況を終わらせろ 12 12 「円のゆくえを問い直す」 「新版論文の教室」 「不愉快な真実」 1 ページ RRRRRRRRR RRRRRRRRRRRRRRR RRRRRRRRR RRRRRRRRRRRRRRR MVO SVO R R HO Glyph Example Name of character FULL WIDTH(half width) WIDTH(half FULL 21 FF01 mark Exclamation ! ! U U R U UUUUUUUU UUU U UUUUU 20 U R R 0000..001F U U R 22 FF02 Quotation mark " " U U R R 002A FF0A Asterisk * * U U R UU U U UU UU U U 23 FF03 Number sign # # U U R U 25 FF05 Percent sign % % U U RUUU UU UUUUU UUU 24 FF04 sign Dollar $ $ U U R U 29 FF09 Right parenthesis ) ) U 002C FF0C Comma , , U U R R RU* 002E FF0E Full stop . . U U R R R 002D FF0D Hyphen-Minus - - U R R R U 002B FF0B Plus sign + + U U R 003F FF1F Question mark ? ? U U R UUUUUUUUUUUUUU U UU U 003E FF1E Greater-than > > U U R R R R 28 FF08 Left parenthesis ( ( U 27 FF07 Apostrophe ' ' U U R 26 FF06 Ampersand & & U U R UUU U U 0041..005A FF21..FF3A A~Z U U R 0040 FF20 Commertial at @ @ U U R 発行年 YEAR PUBLISHED 12 12 12 http://unicode.org/reports/tr50/tr50-5.Orientation.txt 縦中横も除外する。L′(L Prime)は縦中横なのでこの場合のPrimeはUとはしない。-1を縦中横にしているときも同じで「-」は横書き中とみなす Z=14が全体として横倒しのときも除外。 ※Japanese Text, that is excluding western text. 横倒し文脈の中の記号類は除外する。長さには関係なく。 002F FF0F Solidus / / U U R UU UUUU U R U 003A FF1A Colon : : U U RRRRRR R RR 0030..0039FF10..FF19 0~9 U U R 003B FF1B Semicolon ; ; U U R R RU* U 003D FF1D Equals = = U U RRRRRRRRRRRR RR 003C FF1C Less-than sign < < U U R R R 005B FF3B Left square Bracket [ [ U R R R R R R R 005C FF3C Reverse Solidus \ \ U U R 005D FF3D Right Square Bracket ] ] U R R R R R R R 005E FF3E Circumflex accent ^ ^ U U R 005F FF3F Low line _ _ U R R 0060 FF40 Grave Acent ` ` U U R 0061..007A FF41~FF5A 小文字a~z U U R 007B FF5B Left curly bracket { { U R R R 007C FF5C, FFE8(half)Vertical line | | U U R 007D FF5D Right curly bracket } } U R R R 007E Tilde ~ U U R FF5E Tilde(Full width) ~ U Tr Tr 007F..009F U U R 00A0 U R R 00A1 ¡ U U R 00A2 ¢ U U R 00A3 POUND SIGN £ U U R 00A4 U U R 00A5 YEN SIGN \ U U R 00A6 U U R 00A7 § U U U U 00A8 Dieresis ¨ U U R 00A9 © U U U U 00AA U U R 00AB LEFT-POINTING DOUBLE ANGLE QUOTATION« U MARK R R 00AC U U R 00AD U U R 00AE ® U U U 00AF Macron ¯ U U R 00B0 DEGREE SIGN ° U U U 00B1 U U U 00B2 U U R 00B3 U U R 00B4 U U R 00B5 U U R 00B6 ¶ U U U U 00B7 middle dot (Western) · U U R 00B8 U U R 00B9 U U R 00BA U U R 00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION» U MARKR R 00BC U U U 00BD U U U 00BE U U U 00BF ¿ U U R 00C0..00D6 U U R 2 ページ 00D7 U U U 00D8..00F6 U U R 00F7 Division sign ÷ U U U R U U R 00F8..00FF U U R 0100..017F U U R 0180..024F U U R 0250..02AF U U R 02D8 Breve ˘ U U R 02DD Double accute accent ˝ U U R 02B0..02E4 U U R 02E5 U U U 02E6 U U U 02E7 U U U 02E8 U U U 02E9 U U U 02EA U U U 02EB U U U 02EC..02FF U U R 0300..036F U U R 0370..03FF U U R 03B1 α U U RU UU R 03B7 η U U R R 03C0 π U U R U RU 03C1 ρ U U R R 03C7 χ U U R R 03DB ϛ U U R U 0400..04FF U U R 0500..052F U U R 0530..0589 U U R 058A U R R 058B..058F U U R 0590..05BD U U R 05BE U R R 05BF..05FF U U R 0600..06FF U U R 0700..074F U U R 0750..077F U U R 0780..07BF U U R 07C0..07FF U U R 0800..083F U U R 0840..085F U U R 0860..089F U U R 08A0..08FF U U R 0900..097F U U R 0980..09FF U U R 0A00..0A7F U U R 3 ページ 0A80..0AFF U U R 0B00..0B7F U U R 0B80..0BFF U U R 0C00..0C7F U U R 0C80..0CFF U U R 0D00..0D7F U U R 0D80..0DFF U U R 0E00..0E7F U U R 0E80..0EFF U U R 0F00..0FFF U U R 1000..109F U U R 10A0..10FF U U R 1100..11FF U U U 1200..137F U U R 1380..139F U U R 13A0..13FF U U R 1400 U R R 1401..167F U U U 1680..169F U R R 16A0..16FF U U R 1700..171F U U R 1720..173F U U R 1740..175F U U R 1760..177F U U R 1780..17FF U U R 1800..18AF L U U 18B0..18FF U U U 1900..194F U U R 1950..197F U U R 1980..19DF U U R 19E0..19FF U U R 1A00..1A1F U U R 1A20..1AAF U U R 1B00..1B7F U U R 1B80..1BBF U U R 1BC0..1BFF U U R 1C00..1C4F U U R 1C50..1C7F U U R 1CC0..1CCF U U R 1CD0..1CFF U U R 1D00..1D7F U U R 1D80..1DBF U U R 1DC0..1DFF U U R 1E00..1EFF U U R 1F00..1FFF U U R 2000..200A U R R 4 ページ 200B..200F U U R 2010 Hyphen ‐ U R R R RU、R R 2011 Non-breaking hyphen ‑ U R R 2012 Figure dash ‒ U R R 2013 En dash – U R RR R R RRR URR 2014 EM dash — U R RRRRR RRRRRRRRR U、R RRRRRRRR 2015 Horizontal bar ― U R R U 2016 Double vertical bar ‖ U U U 2017 Double low line ‗ U U U 2018 Left single quotation ‘ U R T R 2019 Right single quotation ’ U R T R 201A Single low-9 quatation mark ‚ U R R 201B Single High Reversed-9 quatation mark‛ U R R 201C Left double quatation mark “ U R RRR R UR R 201D Right double quatation mark ” U R R R R R U R R 201E Double low-9 quatation mark „ U R R 201F Double High Reversed-9 quatation mark‟ U R R 2020 Daggar † U U UUU UU 2021 Double daggar ‡ U U U U 2022 Bullet • U U U 2023 Triangular bullet ‣ U U U 2024 One dot leader ․ U RR 2025 Two dot leader ‥ U R R 2026 ※ Horizontal Elipsis … U R RRRRRRRRRRRRRRRRRRRRRRR R 2027 Hyphnation point ‧ U U R 2028..2029 U U R 202A..202E U U R 202F U R R 2030 PER MILLE SIGN ‰ U U U 2031 ‱ U U U 2032 Prime ′ U U U U 2033 Double prime ″ U U U 2034 Triple prime ‴ U U U 2035 Reversed prime ‵ U U U 2036 ‶ U U U 2037 ‷ U U U 2038 Caret ‸ U U R 2039 Single left pointing angle quotation mark‹ U R R 203A Single right pointing angle quotation mark› U R R U 203B ※ U U U U 203C ‼ U U UU U 203D ‽ U U U 203E ‾ U U R 203F ‿ U R R 2040 ″ U R R 2041 ⁁ U U R 5 ページ 2042 ⁂ U U U 2043 ⁃ U U U 2044 ⁄ U R R 2045 ⁅ U R R 2046 ⁆ U R R 2047 ⁇ U U U 2048 ⁈ U U U 2049 ⁉ U U U U U 204A U U R 204B U U R 204C U U R 204D U U R 204E U U R 204F U U R 2050 U U R 2051 ⁑ U U U 2052 U U R 2053 U U R 2054 U R R 2055 U U R 2056 U U R 2057 U U U 2058 U U R 2059 U U R 205A U U R 205B U U R 205C U U R 205D U U R 205E U U R 205F U R R 2060..2064 U U R 2065 U U U 2066 U U U 2067 U U U 2068 U U U 2069 U U U 206A..206F U U R 2070..209F U U R 20A0..20AB U U U 20AC EURO SIGN € U U U 20AD..20CF U U U 20D0..20FF U U U 2100 U U U 2101 U U U 2102 U U U 2103 DEGREE CELSIUS ℃ U U U U U 6 ページ 2104 U U U 2105 U U U 2106 U U U 2107 U U U 2108 U U U 2109 U U U 210A U U U 210B U U U 210C U U U 210D U U U 210E U U U 210F U U U 2110 U U U 2111 U U U 2112 U U U 2113 SCRIPT SMALL L U U U 2114 U U U 2115 U U U 2116 NUMERO SIGN U U U 2117 U U U 2118 U U R 2119 U U U 211A U U U 211B U U U 211C U U U 211D U U U 211E U U U 211F U U U 2120 U U U 2121 U U U 2122 U U U 2123 U U U 2124 U U U 2125 U U U 2126 U U U 2127 U U U 2128 U U U 2129 U U U 212A U U U 212B U U U 212C U U U 212D U U U 212E U U U 212F U U U 2130 U U U 2131 U U U 7 ページ 2132 U U U 2133 U U U 2134 U U U 2135 U U U 2136 U U U 2137 U U U 2138 U U U 2139 U U U 213A U U U 213B U U U 213C U U U 213D U U U 213E U U U 213F U U U 2140 U U R 2141 U U R 2142 U U R 2143 U U R 2144 U U R 2145 U U U 2146 U U U 2147 U U U 2148 U U U 2149 U U U 214A U U U 214B U U R 214C U U U 214D U U U 214E U U U 214F U U U 2150..215F Vulgar fraction 1/7~ U U U 2160..217F Roman numerals ⅠⅡ~ U U UU UU U 2180..218F U U U 2190 ← U U R ? 2191 ↑ U U R U1) U? 2192 → U U RRRRRRR R?,RRR R 2193 ↓ U U R ? 2194 ↔ U U R 2195 ↕ U U R 2196 ↖ U U R 2197 ↗ U U R 2198 ↘ U U R 2199 ↙ U U R 219A..21FF U U R 21D4 ⇔ U U RR 2200..2211 U U R 8 ページ 2211 N-Array Summation ∑ U U R U U 2212 Minus sign - U U R U 2213 U U R 2214 U U R 2215 U U R 2216-22FF Division slash / U U R 221A Square root √ U U R U 222B Integral ∫ U U R U 2300..2307 U U U 2308..230B U U R 230C..231F U U U 2320 U U R 2321 U U R 2322..2328 U U U 2329 U R U 232A U R U 232B U U U 232C..237C U U R 237D..239A U U U 239B..23B3 U U R 23B4..23B6 U U U 23B7..23B9 U U R 23BA..23CF U U U 23D0 U U R 23D1..23DB U U U 23DC..23E1 U U R 23E2..23FF U U U 2400..243F U U U 2440..245F U U U 2460..2469 ①~ U U U U UU UUU 246A..24FF U U U 24D0 ⓐ U U U U 2500..257F U R R 2580..259F U R R 25A0..25B2 U U R 25B2 ▲ U U R U 25B3 △ U U R UUU 25B4..25FF U U R 25BC ▼ U U R U 25BD ▽ U U R U 2600..2604 U U U 2605 ★ U U UU U U 2606 ☆ U U U UU U 2607..2619 U U U 261A..261F U U R 2620..26FF U U U 9 ページ 2700..2767 U U U 2768..2775 U R U 2776..2793 ❶ U U U U 2794..27BF U U R 27C0..27C4 U U R 27C5..27C6 U R R 27C7..27E5 U U R 27E6..27EF U R R 27F0..27FF U U R 2800..28FF U U U 2900..297F U U R 2980..2982 U U R 2983..2998 U R R 2999..29D7 U U R 29D8..29DB U R R 29DC..29FB U U R 29FC..29FD U R R 29FE..29FF U U R 2A00..2AFF U U R 2B00..2B11 U U R 2B12..2B2F U U U 2B30..2B4C U U R 2B4D..2BFF U U U 2C00..2C5F U U R 2C60..2C7F U U R 2C80..2CFF U U R 2D00..2D2F U U R 2D30..2D7F U U R 2D80..2DDF U U R 2DE0..2DFF U U R 2E00..2E16 U U R 2E17 U R R 2E18..2E19 U U R 2E1A U R R 2E1B..2E1F U U R 2E20..2E21 U R R 2E22..2E25 U R R 2E26..2E29 U R R 2E2A..2E39 U U R 2E3A U R R 2E3B U R R 2E3C..2E7F U U R 2E80..2EFF U U U 2F00..2FDF U U U 2FE0..2FEF U U U 2FF0..2FFF U U U 10 ページ 3000 U R U 3001 FF64(half width)IDEOGRAPHIC COMMA 、 U Tu Tu U* 3002 FF61(half width)IDEOGRAPHIC FULL STOP 。 U Tu Tu U* 3003 Ditto mark 〃 U U U U 3004 U U U 3005 IDEOGRAPHIC ITERATION MARK U U U 3006 〆 U U U U 3007 U U U 3008 Left angle bracket 〈 U Tr Tr RR RR RRR R RR 3009 Right angle bracket 〉 U Tr Tr RR RR RRR R RR 300A LEFT DOUBLE ANGLE BRACKET 《 U Tr Tr R R R R 300B 》 U Tr Tr R R R R 300C 「 U Tr TrRRRRRRRRRRRRRRRRRRRRRRRR 300D 」 U Tr TrRRRRRRRRRRRRRRRRRRRRRRRR 300E 『 U Tr TrRRRRRRRRRRRRRRRRRRRR R R 300F 』 U Tr TrRRRRRRRRRRRRRRRRRRRR R R 3010 【 U Tr Tr R R R R R 3011 】 U Tr Tr R R R R R 3012 U U U 3013 U U U 3014 〔 U Tr TrR R RRR RRR 3015 〕 U Tr TrR R
Recommended publications
  • Automatic Labeling of Voiced Consonants for Morphological Analysis of Modern Japanese Literature
    Automatic Labeling of Voiced Consonants for Morphological Analysis of Modern Japanese Literature Teruaki Oka† Mamoru Komachi† [email protected] [email protected] Toshinobu Ogiso‡ Yuji Matsumoto† [email protected] [email protected] Nara Institute of Science and Technology National† Institute for Japanese Language and Linguistics ‡ Abstract literary text,2 which achieves high performance on analysis for existing electronic text (e.g. Aozora- Since the present-day Japanese use of bunko, an online digital library of freely available voiced consonant mark had established books and work mainly from out-of-copyright ma- in the Meiji Era, modern Japanese lit- terials). erary text written in the Meiji Era of- However, the performance of morphological an- ten lacks compulsory voiced consonant alyzers using the dictionary deteriorates if the text marks. This deteriorates the performance is not normalized, because these dictionaries often of morphological analyzers using ordi- lack orthographic variations such as Okuri-gana,3 nary dictionary. In this paper, we pro- accompanying characters following Kanji stems pose an approach for automatic labeling of in Japanese written words. This is problematic voiced consonant marks for modern liter- because not all historical texts are manually cor- ary Japanese. We formulate the task into a rected with orthography, and it is time-consuming binary classification problem. Our point- to annotate by hand. It is one of the major issues wise prediction method uses as its feature in applying NLP tools to Japanese Linguistics be- set only surface information about the sur- cause ancient materials often contain a wide vari- rounding character strings.
    [Show full text]
  • 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721
    Internet Engineering Task Force (IETF) P. Faltstrom, Ed. Request for Comments: 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721 The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Abstract This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN). It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5892. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
    [Show full text]
  • Tailoring Collation to Users and Languages Markus Scherer (Google)
    Tailoring Collation to Users and Languages Markus Scherer (Google) Internationalization & Unicode Conference 40 October 2016 Santa Clara, CA This interactive session shows how to use Unicode and CLDR collation algorithms and data for multilingual sorting and searching. Parametric collation settings - "ignore punctuation", "uppercase first" and others - are explained and their effects demonstrated. Then we discuss language-specific sort orders and search comparison mappings, why we need them, how to determine what to change, and how to write CLDR tailoring rules for them. We will examine charts and data files, and experiment with online demos. On request, we can discuss implementation techniques at a high level, but no source code shall be harmed during this session. Ask the audience: ● How familiar with Unicode/UCA/CLDR collation? ● More examples from CLDR, or more working on requests/issues from audience members? About myself: ● 17 years ICU team member ● Co-designed data structures for the ICU 1.8 collation implementation (live in 2001) ● Re-wrote ICU collation 2012..2014, live in ICU 53 ● Became maintainer of UTS #10 (UCA) and LDML collation spec (CLDR) ○ Fixed bugs, clarified spec, added features to LDML Collation is... Comparing strings so that it makes sense to users Sorting Searching (in a list) Selecting a range “Find in page” Indexing Internationalization & Unicode Conference 40 October 2016 Santa Clara, CA “Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof.” (http://en.wikipedia.org/wiki/Collation) “Collation is the general term for the process and function of determining the sorting order of strings of characters.
    [Show full text]
  • Arxiv:1812.01718V1 [Cs.CV] 3 Dec 2018
    Deep Learning for Classical Japanese Literature Tarin Clanuwat∗ Mikel Bober-Irizar Center for Open Data in the Humanities Royal Grammar School, Guildford Asanobu Kitamoto Alex Lamb Center for Open Data in the Humanities MILA, Université de Montréal Kazuaki Yamamoto David Ha National Institute of Japanese Literature Google Brain Abstract Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the perspective of ML researchers, the content of the task itself is largely irrelevant, and thus there have increasingly been calls for benchmark tasks to more heavily focus on problems which are of social or cultural relevance. In this work, we introduce Kuzushiji-MNIST, a dataset which focuses on Kuzushiji (cursive Japanese), as well as two larger, more challenging datasets, Kuzushiji-49 and Kuzushiji-Kanji. Through these datasets, we wish to engage the machine learning community into the world of classical Japanese literature. 1 Introduction Recorded historical documents give us a peek into the past. We are able to glimpse the world before our time; and see its culture, norms, and values to reflect on our own. Japan has very unique historical pathway. Historically, Japan and its culture was relatively isolated from the West, until the Meiji restoration in 1868 where Japanese leaders reformed its education system to modernize its culture. This caused drastic changes in the Japanese language, writing and printing systems. Due to the modernization of Japanese language in this era, cursive Kuzushiji (くずしc) script is no longer taught in the official school curriculum.
    [Show full text]
  • CJK Symbols and Punctuation Range: 3000–303F
    CJK Symbols and Punctuation Range: 3000–303F This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • TLD: NAGOYA Language Tag: Ja Language Description: Japanese Version: 1.0 Effective Date: 25 November 2013
    TLD: NAGOYA Language Tag: ja Language Description: Japanese Version: 1.0 Effective Date: 25 November 2013 Website: http://www.gmo-registry.com/en/ # Codepoints allowed from the Japanese language. Reference 1 RFC 20 (USASCII) Reference 2 JIS X 0208:1997 Reference 3 Unicode-3.2.0 Version 1.
    [Show full text]
  • Chapter 6, Writing Systems and Punctuation
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
    [Show full text]
  • MSR-4: Annotated Repertoire Tables, Non-CJK
    Maximal Starting Repertoire - MSR-4 Annotated Repertoire Tables, Non-CJK Integration Panel Date: 2019-01-25 How to read this file: This file shows all non-CJK characters that are included in the MSR-4 with a yellow background. The set of these code points matches the repertoire specified in the XML format of the MSR. Where present, annotations on individual code points indicate some or all of the languages a code point is used for. This file lists only those Unicode blocks containing non-CJK code points included in the MSR. Code points listed in this document, which are PVALID in IDNA2008 but excluded from the MSR for various reasons are shown with pinkish annotations indicating the primary rationale for excluding the code points, together with other information about usage background, where present. Code points shown with a white background are not PVALID in IDNA2008. Repertoire corresponding to the CJK Unified Ideographs: Main (4E00-9FFF), Extension-A (3400-4DBF), Extension B (20000- 2A6DF), and Hangul Syllables (AC00-D7A3) are included in separate files. For links to these files see "Maximal Starting Repertoire - MSR-4: Overview and Rationale". How the repertoire was chosen: This file only provides a brief categorization of code points that are PVALID in IDNA2008 but excluded from the MSR. For a complete discussion of the principles and guidelines followed by the Integration Panel in creating the MSR, as well as links to the other files, please see “Maximal Starting Repertoire - MSR-4: Overview and Rationale”. Brief description of exclusion
    [Show full text]
  • Overview and Rationale
    Integration Panel: Maximal Starting Repertoire — MSR-4 Overview and Rationale REVISION – November 09, 2018 Table of Contents 1 Overview 3 2 Maximal Starting Repertoire (MSR-4) 3 2.1 Files 3 2.1.1 Overview 3 2.1.2 Normative Definition 3 2.1.3 Code Charts 4 2.2 Determining the Contents of the MSR 5 2.3 Process of Deciding the MSR 6 3 Scripts 7 3.1 Comprehensiveness and Staging 7 3.2 What Defines a Related Script? 8 3.3 Separable Scripts 8 3.4 Deferred Scripts 9 3.5 Historical and Obsolete Scripts 9 3.6 Selecting Scripts and Code Points for the MSR 9 3.7 Scripts Appropriate for Use in Identifiers 9 3.8 Modern Use Scripts 10 3.8.1 Common and Inherited 11 3.8.2 Scripts included in MSR-1 11 3.8.3 Scripts added in MSR-2 11 3.8.4 Scripts added in MSR-3 or MSR-4 12 3.8.5 Modern Scripts Ineligible for the Root Zone 12 3.9 Scripts for Possible Future MSRs 12 3.10 Scripts Identified in UAX#31 as Not Suitable for identifiers 13 4 Exclusions of Individual Code Points or Ranges 14 4.1 Historic and Phonetic Extensions to Modern Scripts 14 4.2 Code Points That Pose Special Risks 15 4.3 Code Points with Strong Justification to Exclude 15 4.4 Code Points That May or May Not be Excludable from the Root Zone LGR 15 4.5 Non-spacing Combining Marks 16 5 Discussion of Particular Code Points 18 Integration Panel: Maximal Starting Repertoire — MSR-3 Overview and Rationale 5.1 Digits and Hyphen 19 5.2 CONTEXT O Code Points 19 5.3 CONTEXT J Code Points 19 5.4 Code Points Restricted for Identifiers 19 5.5 Compatibility with IDNA2003 20 5.6 Code Points for Which the
    [Show full text]
  • Katakana Range: 30A0–30FF
    Katakana Range: 30A0–30FF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • PDF/Unicode-14.0/ for Charts Showing Only the Characters Added in Unicode 14.0
    Hiragana Range: 3040–309F This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • Section 18.1, Han
    The Unicode® Standard Version 12.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2019 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 12.0. Includes index. ISBN 978-1-936213-22-1 (http://www.unicode.org/versions/Unicode12.0.0/) 1.
    [Show full text]