Jtc1/Sc2/Wg2 N2933
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721
Internet Engineering Task Force (IETF) P. Faltstrom, Ed. Request for Comments: 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721 The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Abstract This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN). It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5892. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. -
1 Symbols (2286)
1 Symbols (2286) USV Symbol Macro(s) Description 0009 \textHT <control> 000A \textLF <control> 000D \textCR <control> 0022 ” \textquotedbl QUOTATION MARK 0023 # \texthash NUMBER SIGN \textnumbersign 0024 $ \textdollar DOLLAR SIGN 0025 % \textpercent PERCENT SIGN 0026 & \textampersand AMPERSAND 0027 ’ \textquotesingle APOSTROPHE 0028 ( \textparenleft LEFT PARENTHESIS 0029 ) \textparenright RIGHT PARENTHESIS 002A * \textasteriskcentered ASTERISK 002B + \textMVPlus PLUS SIGN 002C , \textMVComma COMMA 002D - \textMVMinus HYPHEN-MINUS 002E . \textMVPeriod FULL STOP 002F / \textMVDivision SOLIDUS 0030 0 \textMVZero DIGIT ZERO 0031 1 \textMVOne DIGIT ONE 0032 2 \textMVTwo DIGIT TWO 0033 3 \textMVThree DIGIT THREE 0034 4 \textMVFour DIGIT FOUR 0035 5 \textMVFive DIGIT FIVE 0036 6 \textMVSix DIGIT SIX 0037 7 \textMVSeven DIGIT SEVEN 0038 8 \textMVEight DIGIT EIGHT 0039 9 \textMVNine DIGIT NINE 003C < \textless LESS-THAN SIGN 003D = \textequals EQUALS SIGN 003E > \textgreater GREATER-THAN SIGN 0040 @ \textMVAt COMMERCIAL AT 005C \ \textbackslash REVERSE SOLIDUS 005E ^ \textasciicircum CIRCUMFLEX ACCENT 005F _ \textunderscore LOW LINE 0060 ‘ \textasciigrave GRAVE ACCENT 0067 g \textg LATIN SMALL LETTER G 007B { \textbraceleft LEFT CURLY BRACKET 007C | \textbar VERTICAL LINE 007D } \textbraceright RIGHT CURLY BRACKET 007E ~ \textasciitilde TILDE 00A0 \nobreakspace NO-BREAK SPACE 00A1 ¡ \textexclamdown INVERTED EXCLAMATION MARK 00A2 ¢ \textcent CENT SIGN 00A3 £ \textsterling POUND SIGN 00A4 ¤ \textcurrency CURRENCY SIGN 00A5 ¥ \textyen YEN SIGN 00A6 -
Psftx-Centos-8.2/Pancyrillic.F16.Psfu Linux Console Font Codechart
psftx-centos-8.2/pancyrillic.f16.psfu Linux console font codechart Glyphs 0x000 to 0x0FF 0 1 2 3 4 5 6 7 8 9 A B C D E F 0x00_ 0x01_ 0x02_ 0x03_ 0x04_ 0x05_ 0x06_ 0x07_ 0x08_ 0x09_ 0x0A_ 0x0B_ 0x0C_ 0x0D_ 0x0E_ 0x0F_ Page 1 Glyphs 0x100 to 0x1FF 0 1 2 3 4 5 6 7 8 9 A B C D E F 0x10_ 0x11_ 0x12_ 0x13_ 0x14_ 0x15_ 0x16_ 0x17_ 0x18_ 0x19_ 0x1A_ 0x1B_ 0x1C_ 0x1D_ 0x1E_ 0x1F_ Page 2 Font information 0x018 U+2191 UPWARDS ARROW Filename: psftx-centos-8.2/pancyrillic.f16.psfu 0x019 U+2193 DOWNWARDS ARROW PSF version: 1 0x01A U+2192 RIGHTWARDS ARROW Glyph size: 8 × 16 pixels Glyph count: 512 0x01B U+2190 LEFTWARDS ARROW Unicode font: Yes (mapping table present) 0x01C U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK Unicode mappings 0x01D U+2040 CHARACTER TIE 0x000 U+FFFD REPLACEMENT CHARACTER 0x01E U+25B2 BLACK UP-POINTING 0x001 U+2022 BULLET TRIANGLE 0x01F U+25BC BLACK DOWN-POINTING 0x002 U+25C6 BLACK DIAMOND, TRIANGLE U+2666 BLACK DIAMOND SUIT 0x020 U+0020 SPACE 0x003 U+2320 TOP HALF INTEGRAL 0x021 U+0021 EXCLAMATION MARK 0x004 U+2321 BOTTOM HALF INTEGRAL 0x022 U+0022 QUOTATION MARK 0x005 U+2013 EN DASH 0x023 U+0023 NUMBER SIGN 0x006 U+2014 EM DASH 0x024 U+0024 DOLLAR SIGN 0x007 U+2026 HORIZONTAL ELLIPSIS 0x025 U+0025 PERCENT SIGN 0x008 U+201A SINGLE LOW-9 QUOTATION MARK 0x026 U+0026 AMPERSAND 0x009 U+201E DOUBLE LOW-9 0x027 U+0027 APOSTROPHE QUOTATION MARK 0x00A U+2018 LEFT SINGLE QUOTATION 0x028 U+0028 LEFT PARENTHESIS MARK 0x029 U+0029 RIGHT PARENTHESIS 0x00B U+2019 RIGHT SINGLE QUOTATION MARK 0x02A U+002A ASTERISK 0x00C U+201C LEFT DOUBLE -
ISO/IEC International Standard 10646-1
JTC1/SC2/WG2 N3381 ISO/IEC 10646:2003/Amd.4:2008 (E) Information technology — Universal Multiple-Octet Coded Character Set (UCS) — AMENDMENT 4: Cham, Game Tiles, and other characters such as ISO/IEC 8824 and ISO/IEC 8825, the concept of Page 1, Clause 1 Scope implementation level may still be referenced as „Implementa- tion level 3‟. See annex N. In the note, update the Unicode Standard version from 5.0 to 5.1. Page 12, Sub-clause 16.1 Purpose and con- text of identification Page 1, Sub-clause 2.2 Conformance of in- formation interchange In first paragraph, remove „, the implementation level,‟. In second paragraph, remove „, and to an identified In second paragraph, remove „with an implementation implementation level chosen from clause 14‟. level‟. In fifth paragraph, remove „, the adopted implementa- Page 12, Sub-clause 16.2 Identification of tion level‟. UCS coded representation form with imple- mentation level Page 1, Sub-clause 2.3 Conformance of de- vices Rename sub-clause „Identification of UCS coded repre- sentation form‟. In second paragraph (after the note), remove „the adopted implementation level,‟. In first paragraph, remove „and an implementation level (see clause 14)‟. In fourth and fifth paragraph (b and c statements), re- move „and implementation level‟. Replace the 6-item list by the following 2-item list and note: Page 2, Clause 3 Normative references ESC 02/05 02/15 04/05 Update the reference to the Unicode Bidirectional Algo- UCS-2 rithm and the Unicode Normalization Forms as follows: ESC 02/05 02/15 04/06 Unicode Standard Annex, UAX#9, The Unicode Bidi- rectional Algorithm, Version 5.1.0, March 2008. -
Psftx-Opensuse-15.2/Pancyrillic.F16.Psfu Linux Console Font Codechart
psftx-opensuse-15.2/pancyrillic.f16.psfu Linux console font codechart Glyphs 0x000 to 0x0FF 0 1 2 3 4 5 6 7 8 9 A B C D E F 0x00_ 0x01_ 0x02_ 0x03_ 0x04_ 0x05_ 0x06_ 0x07_ 0x08_ 0x09_ 0x0A_ 0x0B_ 0x0C_ 0x0D_ 0x0E_ 0x0F_ Page 1 Glyphs 0x100 to 0x1FF 0 1 2 3 4 5 6 7 8 9 A B C D E F 0x10_ 0x11_ 0x12_ 0x13_ 0x14_ 0x15_ 0x16_ 0x17_ 0x18_ 0x19_ 0x1A_ 0x1B_ 0x1C_ 0x1D_ 0x1E_ 0x1F_ Page 2 Font information 0x017 U+221E INFINITY Filename: psftx-opensuse-15.2/pancyrillic.f16.p 0x018 U+2191 UPWARDS ARROW sfu PSF version: 1 0x019 U+2193 DOWNWARDS ARROW Glyph size: 8 × 16 pixels 0x01A U+2192 RIGHTWARDS ARROW Glyph count: 512 Unicode font: Yes (mapping table present) 0x01B U+2190 LEFTWARDS ARROW 0x01C U+2039 SINGLE LEFT-POINTING Unicode mappings ANGLE QUOTATION MARK 0x000 U+FFFD REPLACEMENT 0x01D U+2040 CHARACTER TIE CHARACTER 0x01E U+25B2 BLACK UP-POINTING 0x001 U+2022 BULLET TRIANGLE 0x002 U+25C6 BLACK DIAMOND, 0x01F U+25BC BLACK DOWN-POINTING U+2666 BLACK DIAMOND SUIT TRIANGLE 0x003 U+2320 TOP HALF INTEGRAL 0x020 U+0020 SPACE 0x004 U+2321 BOTTOM HALF INTEGRAL 0x021 U+0021 EXCLAMATION MARK 0x005 U+2013 EN DASH 0x022 U+0022 QUOTATION MARK 0x006 U+2014 EM DASH 0x023 U+0023 NUMBER SIGN 0x007 U+2026 HORIZONTAL ELLIPSIS 0x024 U+0024 DOLLAR SIGN 0x008 U+201A SINGLE LOW-9 QUOTATION 0x025 U+0025 PERCENT SIGN MARK 0x026 U+0026 AMPERSAND 0x009 U+201E DOUBLE LOW-9 QUOTATION MARK 0x027 U+0027 APOSTROPHE 0x00A U+2018 LEFT SINGLE QUOTATION 0x028 U+0028 LEFT PARENTHESIS MARK 0x00B U+2019 RIGHT SINGLE QUOTATION 0x029 U+0029 RIGHT PARENTHESIS MARK 0x02A U+002A ASTERISK -
Cyrillic # Version Number
############################################################### # # TLD: xn--j1aef # Script: Cyrillic # Version Number: 1.0 # Effective Date: July 1st, 2011 # Registry: Verisign, Inc. # Address: 12061 Bluemont Way, Reston VA 20190, USA # Telephone: +1 (703) 925-6999 # Email: [email protected] # URL: http://www.verisigninc.com # ############################################################### ############################################################### # # Codepoints allowed from the Cyrillic script. # ############################################################### U+0430 # CYRILLIC SMALL LETTER A U+0431 # CYRILLIC SMALL LETTER BE U+0432 # CYRILLIC SMALL LETTER VE U+0433 # CYRILLIC SMALL LETTER GE U+0434 # CYRILLIC SMALL LETTER DE U+0435 # CYRILLIC SMALL LETTER IE U+0436 # CYRILLIC SMALL LETTER ZHE U+0437 # CYRILLIC SMALL LETTER ZE U+0438 # CYRILLIC SMALL LETTER II U+0439 # CYRILLIC SMALL LETTER SHORT II U+043A # CYRILLIC SMALL LETTER KA U+043B # CYRILLIC SMALL LETTER EL U+043C # CYRILLIC SMALL LETTER EM U+043D # CYRILLIC SMALL LETTER EN U+043E # CYRILLIC SMALL LETTER O U+043F # CYRILLIC SMALL LETTER PE U+0440 # CYRILLIC SMALL LETTER ER U+0441 # CYRILLIC SMALL LETTER ES U+0442 # CYRILLIC SMALL LETTER TE U+0443 # CYRILLIC SMALL LETTER U U+0444 # CYRILLIC SMALL LETTER EF U+0445 # CYRILLIC SMALL LETTER KHA U+0446 # CYRILLIC SMALL LETTER TSE U+0447 # CYRILLIC SMALL LETTER CHE U+0448 # CYRILLIC SMALL LETTER SHA U+0449 # CYRILLIC SMALL LETTER SHCHA U+044A # CYRILLIC SMALL LETTER HARD SIGN U+044B # CYRILLIC SMALL LETTER YERI U+044C # CYRILLIC -
Psftx-Opensuse-15.0/Pancyrillic.F16.Psfu Linux Console Font Codechart
psftx-opensuse-15.0/pancyrillic.f16.psfu Linux console font codechart Glyphs 0x000 to 0x0FF 0 1 2 3 4 5 6 7 8 9 A B C D E F 0x00_ 0x01_ 0x02_ 0x03_ 0x04_ 0x05_ 0x06_ 0x07_ 0x08_ 0x09_ 0x0A_ 0x0B_ 0x0C_ 0x0D_ 0x0E_ 0x0F_ Page 1 Glyphs 0x100 to 0x1FF 0 1 2 3 4 5 6 7 8 9 A B C D E F 0x10_ 0x11_ 0x12_ 0x13_ 0x14_ 0x15_ 0x16_ 0x17_ 0x18_ 0x19_ 0x1A_ 0x1B_ 0x1C_ 0x1D_ 0x1E_ 0x1F_ Page 2 Font information 0x017 U+221E INFINITY Filename: psftx-opensuse-15.0/pancyrillic.f16.p 0x018 U+2191 UPWARDS ARROW sfu PSF version: 1 0x019 U+2193 DOWNWARDS ARROW Glyph size: 8 × 16 pixels 0x01A U+2192 RIGHTWARDS ARROW Glyph count: 512 Unicode font: Yes (mapping table present) 0x01B U+2190 LEFTWARDS ARROW 0x01C U+2039 SINGLE LEFT-POINTING Unicode mappings ANGLE QUOTATION MARK 0x000 U+FFFD REPLACEMENT 0x01D U+2040 CHARACTER TIE CHARACTER 0x01E U+25B2 BLACK UP-POINTING 0x001 U+2022 BULLET TRIANGLE 0x002 U+25C6 BLACK DIAMOND, 0x01F U+25BC BLACK DOWN-POINTING U+2666 BLACK DIAMOND SUIT TRIANGLE 0x003 U+2320 TOP HALF INTEGRAL 0x020 U+0020 SPACE 0x004 U+2321 BOTTOM HALF INTEGRAL 0x021 U+0021 EXCLAMATION MARK 0x005 U+2013 EN DASH 0x022 U+0022 QUOTATION MARK 0x006 U+2014 EM DASH 0x023 U+0023 NUMBER SIGN 0x007 U+2026 HORIZONTAL ELLIPSIS 0x024 U+0024 DOLLAR SIGN 0x008 U+201A SINGLE LOW-9 QUOTATION 0x025 U+0025 PERCENT SIGN MARK 0x026 U+0026 AMPERSAND 0x009 U+201E DOUBLE LOW-9 QUOTATION MARK 0x027 U+0027 APOSTROPHE 0x00A U+2018 LEFT SINGLE QUOTATION 0x028 U+0028 LEFT PARENTHESIS MARK 0x00B U+2019 RIGHT SINGLE QUOTATION 0x029 U+0029 RIGHT PARENTHESIS MARK 0x02A U+002A ASTERISK -
MSR-4: Annotated Repertoire Tables, Non-CJK
Maximal Starting Repertoire - MSR-4 Annotated Repertoire Tables, Non-CJK Integration Panel Date: 2019-01-25 How to read this file: This file shows all non-CJK characters that are included in the MSR-4 with a yellow background. The set of these code points matches the repertoire specified in the XML format of the MSR. Where present, annotations on individual code points indicate some or all of the languages a code point is used for. This file lists only those Unicode blocks containing non-CJK code points included in the MSR. Code points listed in this document, which are PVALID in IDNA2008 but excluded from the MSR for various reasons are shown with pinkish annotations indicating the primary rationale for excluding the code points, together with other information about usage background, where present. Code points shown with a white background are not PVALID in IDNA2008. Repertoire corresponding to the CJK Unified Ideographs: Main (4E00-9FFF), Extension-A (3400-4DBF), Extension B (20000- 2A6DF), and Hangul Syllables (AC00-D7A3) are included in separate files. For links to these files see "Maximal Starting Repertoire - MSR-4: Overview and Rationale". How the repertoire was chosen: This file only provides a brief categorization of code points that are PVALID in IDNA2008 but excluded from the MSR. For a complete discussion of the principles and guidelines followed by the Integration Panel in creating the MSR, as well as links to the other files, please see “Maximal Starting Repertoire - MSR-4: Overview and Rationale”. Brief description of exclusion -
MSR-3-Annotated-Non-CJK-Tables-20180115
Maximal Starting Repertoire - MSR-3 Annotated Repertoire Tables, Non-CJK Integration Panel Date: 2018-1-15 How to read this file: This file shows all non-CJK characters that are included in the MSR-3 with a yellow background. The set of these code points matches the repertoire specified in the XML format of the MSR. Where present, annotations on individual code points indicate some or all of the languages a code point is used for. This file lists only those Unicode blocks containing non-CJK code points included in the MSR. Code points listed in this document, which are PVALID in IDNA2008 but excluded from the MSR for various reasons are shown with pinkish annotations indicating the primary rationale for excluding the code points, together with other information about usage background, where present. Code points shown with a white background are not PVALID in IDNA2008. Repertoire corresponding to the CJK Unified Ideographs: Main (4E00-9FFF), Extension-A (3400-4DBF), Extension B (20000-2A6DF), and Hangul Syllables (AC00-D7A3) are included in separate files. For links to these files see "Maximal Starting Repertoire - MSR-3: Overview and Rationale". How the repertoire was chosen: This file only provides a brief categorization of code points that are PVALID in IDNA2008 but excluded from the MSR. For a complete discussion of the principles and guidelines followed by the Integration Panel in creating the MSR, as well as links to the other files, please see “Maximal Starting Repertoire - MSR-3: Overview and Rationale”. Brief description of exclusion -
Building a Database of Uralic Languages
Languages under the influence: Building a database of Uralic languages Eszter Simon, Nikolett Mus Research Institute for Linguistics Hungarian Academy of Sciences {simon.eszter, mus.nikolett}@nytud.mta.hu Abstract For most of the Uralic languages, there is a lack of systematically collected, consequently transcribed and morphologically annotated text corpora. This pa- per sums up the steps, the preliminary results and the future directions of build- ing a linguistic corpus of some Uralic languages, namely Tundra Nenets, Udmurt, Synya Khanty, and Surgut Khanty. The experiences of building a corpus con- taining both old and modern, and written and oral data samples are discussed. Principles concerning data collection strategies of languages with different level of vitality and endangerment are discussed. Methodologies and challenges of data processing, and the levels of linguistic annotation are also described in detail. 1 Introduction This paper sums up the steps, the preliminary results and the future directionsof building a linguistic corpus of some Uralic languages within the research project called Languages under the Influence. The project started in February 2016 and lasts until July 2017 and is funded by the Hungarian National Research, Development and Innovation Office (grant ID: ERC_HU_15 118079). It is a pre-ERC project getting national support to enter the European Research Council (ERC) programme¹, thus it is a pilot project. Its aim is to create the theoretical This work is licensed under a Creative Commons Attribution–NoDerivatives 4.0 International Licence. Licence details: http://creativecommons.org/licenses/by-nd/4.0/ ¹https://erc.europa.eu/ 1 10 The 3rd International Workshop for Computational Linguistics of Uralic Languages, pages 10–24, St. -
1 TO: the Unicode Technical Committee and ISO/IEC JTC1/SC2
TO: The Unicode Technical Committee and ISO/IEC JTC1/SC2 WG2 L2/12‐052 FROM: Deborah Anderson (SEI, UC Berkeley) on behalf of Tapani Salminen DATE: 30 January 2012 RE: Request for 2 New Cyrillic Characters for the Khanty and Nenets Languages Summary: Tapani Salminen, a Finno‐Ugrian language specialist with expertise in the Forest Nenets, Northern Khanty, and Eastern Khanty languages, provides evidence in this document from recent native publications for 2 Cyrillic characters that are not yet in Unicode, but which are needed by the native communities in Siberia to represent their languages. The two characters are: CYRILLIC CAPITAL LETTER EL WITH DESCENDER CYRILLIC SMALL LETTER EL WITH DESCENDER Background on the encoding of EL WITH DESCENDER The CYRILLIC LETTER EL WITH DESCENDER, a fricolateral, is only found in a small number of languages in the Far North (Siberia), including Northern Khanty, Eastern Khanty, and Forest Nenets. Because the history of encoding of this character is closely tied to encoding CYRILLIC EL characters with various “appendages”, the discussion below traces the history of the encoding of these other characters. The current set of forms of the Cyrillic letter EL with an “appendage” as they appear in the Unicode Standard include: Khanty letters The following characters are not yet encoded: ‐‐‐‐‐‐‐ CYRILLIC CAPITAL LETTER EL WITH DESCENDER ‐‐‐‐‐‐‐ CYRILLIC SMALL LETTER EL WITH DESCENDER The upper and lowercase EL WITH DESCENDER characters were first proposed in 1997 (N1590) and then 1 again in 1998 (N17441) as a character needed for the Kildin Sámi language. The small and capital characters were approved by the character encoding committees and appeared on a ballot for an amendment to 10646 (ISO/IEC 10646‐1: 1993/Amd. -
Cyrillic Range: 0400–04FF
Cyrillic Range: 0400–04FF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.