N3921 Iso/Iec Jtc 1/Sc 2 N 4156 Date: 2010-09-24
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Iso/Iec Jtc1/Sc2/Wg2 N 3936 L2/10-385
ISO/IEC JTC1/SC2/WG2 N 3936 Date: 2010-10-06 ISO/IEC JTC1/SC2/WG2 Coded Character Set Secretariat: Japan (JISC) Doc. Type: Disposition of comments Title: Disposition of comments on SC2 N 4146 (ISO/IEC CD 10646, 3rd Ed. Information Technology – Universal Coded Character Set (UCS)) Source: Michel Suignard (project editor) Project: JTC1 02.10646.00.00.00.03 Status: For review by WG2 Date: 2010-09-24 Distribution: WG2 Reference: SC2 N4146, N4156, WG2 N3892 Medium: Paper, PDF file Comments were received from Armenia, China, Egypt, Ireland, Japan, Korea (ROK), Norway, and U.S.A. The following document is the disposition of those comments. The disposition is organized per country. Note – The full content of the ballot comments have been included in this document to facilitate the reading. The dispositions are inserted in between these comments and are marked in Underlined Bold Serif text, with explanatory text in italicized serif. As a result of these dispositions all countries with negative vote have changed their vote to positive. Page 1 of 20 Armenia: comments Technical comments T1. a) Armenian Dram Sign Upon consultation with the local specialist and the Armenian Dram Sign author SARM decided to stay with its request to place the sign in the "Currency Symbols" range 20A0-20CF at the available position 20B9. One of the main reasons for that is that the currency symbols are united in one and the same block on the basis of the main elements repeated in those things, and not on the basis of national alphabets or scripts. In other words the signs in this range are grouped in accordance with their functionality alike the three-letter abbreviations for the monetary instruments. -
ISO/IEC JTC1/SC2/WG2 N 4098 Date: 2011-06-06
ISO/IEC JTC1/SC2/WG2 N 4098 Date: 2011-06-06 ISO/IEC JTC1/SC2/WG2 Coded Character Set Secretariat: Japan (JISC) Doc. Type: Disposition of comments Title: Disposition of comments on SC2 N 4168 (ISO/IEC FCD 10646 3rd Ed., Information Technology – Universal Coded Character Set (UCS)) Source: Michel Suignard (project editor) Project: JTC1 02.10646.00.00.00.03 Status: For review by WG2 Date: 2011-06-06 Distribution: WG2 Reference: SC2 N4168, N4181, WG2 N4023 Medium: Paper, PDF file Comments were received from Armenia, Egypt, Germany, Ireland, Japan, Korea (ROK), U.K, and U.S.A. The following document is the draft disposition of those comments. The disposition is organized per country. Note – The full content of the ballot comments have been included in this document to facilitate the reading. The dispositions are inserted in between these comments and are marked in Underlined Bold Serif text, with explanatory text in italicized serif. As a result of these dispositions, Japan, Korea (ROK), and the USA changed their vote to positive. This only leaves one negative vote (Ireland). Page 1 of 16 Armenia (comment not related to a vote) Technical comments T1. Currency symbols It is clear idea to combine all the signs, including the monetary one, in one and the same national block, however it is preferable to place Armenian Dram Symbol into Currency Symbols table. Armenian Dram symbol has two horizontal strokes like the majority of the symbols in that table, and those symbols are grouped there together on the basis of functionality and symbolism. Noted The preference for moving the Armenian Dram symbol is noted. -
The Unicode Standard, Version 4.0--Online Edition
This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. -
The Unicode Standard, Version 3.0, Issued by the Unicode Consor- Tium and Published by Addison-Wesley
The Unicode Standard Version 3.0 The Unicode Consortium ADDISON–WESLEY An Imprint of Addison Wesley Longman, Inc. Reading, Massachusetts · Harlow, England · Menlo Park, California Berkeley, California · Don Mills, Ontario · Sydney Bonn · Amsterdam · Tokyo · Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. If these files have been purchased on computer-readable media, the sole remedy for any claim will be exchange of defective media within ninety days of receipt. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. ISBN 0-201-61633-5 Copyright © 1991-2000 by Unicode, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or other- wise, without the prior written permission of the publisher or Unicode, Inc. -
Écritures De L'extrême-Orient
Chapitre 11 Écritures de l’Extrême-Orient L’écriture idéographique1 développée en Chine deux mille ans avant J.-C. est à l’origine de tous les systèmes d’écriture utilisés aujourd’hui en Extrême-Orient. Conçue pour écrire les divers dialectes chinois, l’écriture idéographique s’étendit aux pays de la zone d’influence chinoise et servit à écrire pendant des siècles d’autres langues que le chinois, comme le japonais, le coréen et le vietnamien. Des locuteurs d’autres langues, comme les Yi, créèrent leur propre système idéographique en imitant le chinois. Le chinois, langue en grande partie monosyllabique et sans flexion, convient parfaitement aux systèmes d’écriture idéographique. Les idéogrammes se prêtent moins bien à d’autres types de langues. Les Japonais résolurent ce problème en créant deux écritures syllabiques, le hiragana et le katakana. Les Coréens inventèrent un système alphabétique dans lequel les lettres sont groupées en blocs syllabiques ressemblant à des idéogrammes appelés hangûl. Jusqu’au XXe siècle, le vietnamien ne s’écrivait qu’à l’aide d’idéogrammes. Cette écriture, parfois qualifiée d’annamite, fut ensuite remplacée par un alphabet dérivé de l’écriture latine qu’avait codifié dès le XVIIe siècle par Alexandre de Rhodes, un jésuite français. Le japonais emploit encore aujourd’hui abondamment des idéogrammes appelés kanji ; ils sont plus rares en coréen, où les idéogrammes se nomment hanja. En Chine continentale, le gouvernement tente de promouvoir l’utilisation d’idéogrammes modernes et simplifiés plutôt que ceux, plus anciens et plus traditionnels, utilisés à Taïwan ou dans les communautés chinoises d’outre- mer. -
Chinese Taalverwerking Op De Computer
FACULTEIT LETTEREN DEPARTEMENT OOSTERSE EN SLAVISCHE STUDIES KATHOLIEKE UNIVERSITEIT LEUVEN CHINESE TAALVERWERKING OP DE COMPUTER Deel I : Theoretisch Overzicht Promotor : Prof. Dr. Fred Truyen Verhandeling aangeboden tot het verkrijgen van de graad van licentiaat in de Sinologie door: Sébastien Bruggeman - 2001-2002 - VOORWOORD Dit theoretische overzicht handelt over de Chinese taalverwerking op de computer. Het heeft de bedoeling om zo volledig mogelijk te zijn, maar zal het helaas nooit kunnen zijn door de uitgebreidheid van dit onderwerp. Hoewel dit deel veel technische details bevat is er geen voorkennis vereist. Naast dit theoretisch overzicht is er ook nog een praktische handleiding voor mensen die Chinees in de praktijk op hun computer willen gebruiken. Ook voor dit deel is geen voorkennis vereist, wel wordt er gerekend op een basiskennis van Microsoft Windows. Het voorhanden hebben van een computer met internetverbinding maakt het mogelijk om alles onmiddellijk in de praktijk om te zetten. Het derde luik van deze verhandeling is een website. Op deze website kunnen extra documentatie, voorbeelden en links gevonden worden. Daarnaast kan men ook terecht op het forum voor extra vragen en antwoorden. Tot slot wens ik U nog veel leesplezier en hoop ik dat U door deze licentiaatsverhandeling een betere kijk krijgt op de Chinese taalverwerking op de computer. Sébastien Bruggeman Thesis Sébastien Bruggeman Pagina 2 Thesis Sébastien Bruggeman Pagina 3 INHOUDSTAFEL 0. Gebruikte conventies......................................................................................................11 -
ISO/IEC JTC1/SC2/WG2 N 3892 Date: 2010-09-24
ISO/IEC JTC1/SC2/WG2 N 3892 Date: 2010-09-24 ISO/IEC JTC1/SC2/WG2 Coded Character Set Secretariat: Japan (JISC) Doc. Type: Draft disposition of comments Title: Draft disposition of comments on SC2 N 4146 (ISO/IEC CD 10646, 3rd Ed. Information Technology – Universal Coded Character Set (UCS)) Source: Michel Suignard (project editor) Project: JTC1 02.10646.00.00.00.03 Status: For review by WG2 Date: 2010-09-24 Distribution: WG2 Reference: SC2 N4146, N4156 Medium: Paper, PDF file Comments were received from Armenia, China, Egypt, Ireland, Japan, Korea (ROK), Norway, and U.S.A. The following document is the draft disposition of those comments. The disposition is organized per country. Note – The full content of the ballot comments have been included in this document to facilitate the reading. The dispositions are inserted in between these comments and are marked in Underlined Bold Serif text, with explanatory text in italicized serif. Page 1 of 20 Armenia: comments Technical comments T1. a) Armenian Dram Sign Upon consultation with the local specialist and the Armenian Dram Sign author SARM decided to stay with its request to place the sign in the "Currency Symbols" range 20A0-20CF at the available position 20B9. One of the main reasons for that is that the currency symbols are united in one and the same block on the basis of the main elements repeated in those things, and not on the basis of national alphabets or scripts. In other words the signs in this range are grouped in accordance with their functionality alike the three-letter abbreviations for the monetary instruments. -
Text Mining: Extraction of Interesting Association Rule with Frequent Itemsets Mining for Korean Language from Unstructured Data
International Journal of Multimedia and Ubiquitous Engineering Vol.10, No.11 (2015), pp.11-20 http://dx.doi.org/10.14257/ijmue.2015.10.11.02 Text Mining: Extraction of Interesting Association Rule with Frequent Itemsets Mining for Korean Language from Unstructured Data Irfan Ajmal Khan, Junghyun Woo, Ji-Hoon Seo and Jin-Tak Choi Department of computer Engineering INU (Incheon National Univeristy) Incheon, South Korea [email protected], [email protected], [email protected], [email protected] Abstract Text mining is a specific method to extract knowledge from structured and unstructured data. This extracted knowledge from text mining process can be used for further usage and discovery. This paper presents the method for extraction information from unstructured text data and the importance of Association Rules Mining, specifically for of Korean language (text) and also, NLP (Natural Language Processing) tools are explained. Association Rules Mining (ARM) can also be used for mining association between itemsets from unstructured data with some modifications. Which can then, help for generating statistical thesaurus, to mine grammatical rules and to search large data efficiently. Although various association rules mining techniques have successfully used for market basket analysis but very few has applied on Korean text. A proposed Korean language mining method calculates and extracts meaningful patterns (association rules) between words and presents the hidden knowledge. First it cleans and integrates data, select relevant data then transform into transactional database. Then data mining techniques are used on data source to extract hidden patterns. These patterns are evaluated by specific rules until we get the valid and satisfactory result. -
Iso/Iec Jtc/Sc2/Wg2 N4103-Ai 2012-01-03
ISO/IEC JTC/SC2/WG2 N4103-AI 2012-01-03 Action Items from ISO/IEC JTC1/SC2/WG2 meeting 58, Helsinki, Finland; 2011-06-06/10 For review at meeting 59, Mountain View, CA, USA, 2012-02-16/20 (the following is an extract of Action items section of document N4103, meeting 58 unconfirmed minutes) V.S. Umamaheswaran, Recording Secretary 15 Action items All the action items recorded in the minutes of the previous meetings from 25 to 51, and, 53 to 55 have been either completed or dropped. Status of outstanding action items from previous meetings 52, 56, 57 and new action items from the last meeting 58 are listed in the tables below. Meeting 25, 1994-04-18/22, Antalya, Turkey (document N1033) Meeting 26, 1994-10-10/14, San Francisco, CA, USA (document N1117) Meeting 27, 1995-04-03/07, Geneva, Switzerland (document N1203) Meeting 28, 1995-06-22/26, Helsinki, Finland (document N1253) Meeting 29, 1995-11-06/10, Tokyo, Japan (document N1303) Meeting 30, 1996-04-22/26, Copenhagen, Denmark (document N1353) Meeting 31, 1996-08-12/16, Québec City, Canada (document N1453) Meeting 32, 1997-01-20/24, Singapore (document N1503) Meeting 33, 1997-06-30/07-04, Heraklion, Crete, Greece (document N1603) Meeting 34, 1998-03-16/20, Redmond, WA, USA (document N1703) Meeting 35, 1998-09-21/25, London, UK (document N1903) Meeting 36, 1999-03-09/15, Fukuoka, Japan (document N2003) Meeting 37, 1999-09-17/21, Copenhagen, Denmark (document N2103) Meeting 38, 2000-07-18/21, Beijing, China (document N2203) Meeting 39, 2000-10-08/11, Vouliagmeni, Athens, Greece (document