Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
UTR #23: the Unicode Character Property Model
Technical Reports Proposed Update Unicode Technical Report #23 Author Ken Whistler ([email protected]) and Asmus Freytag Editors ([email protected]) Date 2014-08-12 This Version http://www.unicode.org/reports/tr23/tr23-10.html Previous http://www.unicode.org/reports/tr23/tr23-9.html Version Latest http://www.unicode.org/reports/tr23/ Version Revision 10 Summary This document presents a conceptual model of character properties defined in the Unicode Standard. Status This document is a proposed update of a previously approved Unicode Technical Report. This document may be updated, replaced, or superseded by other documents at any time. Publication does not imply endorsement by the Unicode Consortium. This is not a stable document; it is inappropriate to cite this document as other than a work in progress. A Unicode Technical Report (UTR) contains informative material. Conformance to the Unicode Standard does not imply conformance to any UTR. Other specifications, however, are free to make normative references to a UTR. Please submit corrigenda and other comments with the online reporting form [Feedback]. Related information that is useful in understanding this document is found in the References. For the latest version of the Unicode Standard see [Unicode]. For a list of current Unicode Technical Reports see [Reports]. For more information about versions of the Unicode Standard, see [Versions]. Contents 1. Scope 2. Overview 2.1 Origin of Character Properties 2.2 Character Behavior in Context 2.3 Relation of Character Properties to Algorithms 2.4 Code Point Properties and Abstract Character Properties 2.5 Normative Properties 2.6 Informative Properties 2.7 Referring to Properties 2.8 The Unicode Character Database 3. -
Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only. -
Kindergarten Specials Activities
Kindergarten Specials Activities Art Music P.E. S.T.E.M Activity 1: Find primary Activity 1: Teach someone a Activity 1: Same Spot, Sock Shot Science: Take a nature walk and collect song you have learned in Using clean pair of balled up socks some natural materials. When you get colors (red, yellow, blue) in practice tossing under hand and then home sort the materials you your house. music class. If the song had a overhand while stepping with the found. Below are some examples of how game, teach the game as well. opposite foot to a target. Please ask you might sort. If you have time after adult permission to use certain sorting try building something cool with the natural materials. targets at your home. Examples: K-1st: sort by shape, color, or size laundry hamper, spot on wall/tile on 2nd-3rd: sort by texture, shape AND floor, or family members make a size, or living vs non-living hoop with their arms. 4th-6th: Have students create their own way of sorting and then challenge someone elseto identify their method. Activity 2: I SPY: Find Activity 2: Create your own Activity 2: Cardio Day Technology: Have someone in your instrument. You can draw it 5 Minute Morning Dance house pretend to be a robot. You secondary colors outside should program the robot how to (green, orange, purple) out and explain what it would Party move around the house. Remember be made out of, how it would Evening Running Competition; robots cannot make any decisions for work, and how it would How many laps can each themselves, they only can do what sound. -
TJ Specials Bingo Card 3-5 Week 3.Xlsx
A B C D E 1 Specials Bingo: 3rd-5th Grade 2 April 20-26 3 Art PE Music Art PE With a family member Turn your self or Draw a large flower. each person has a someone in your Fill each petal with a paper, see who can list house into the Mona Watch a show that you different line/shape the most exercise Lisa. Use whatever like. Every time pattern. Trace each Watch a musical as a movements in 30 items you have to someone on the show one with different family. seconds! Then after make them look like laughs do five jumping colored markers, time is up, do two of her. Take picture if jacks. crayons, pencils or each exercise on the you can and email it to pens. 4 lists. me. :) 5 Date:________ Date:________ Date:________ Date:________ Date:________ 6 Music PE Art Music Art 2 DIMENSIONAL and Draw a picture of your choice. Do 5 squats in each room of your Color it using only blues. Use as Listen to a style of 3 DIMENSIONAL - house. Then do 5 mountain many blue markers, crayons, pens, climbers in each room. Lastly 5 music that you Find the folowing in Complete your weekly pencils that you can find. Using jumping jacks in each room. Extra one color family is called normally do not and your house: listening journal. challenge. Time yourself while you MONOCHROMATIC. This is how do this... try a second time and see write down 5 things 2D/3D=square/cube we did our feathers at the if you can beat your first score. -
CEN WORKSHOP Agreementfinal Draft for CWA/MES:1998
CEN WORKSHOP AGREEMENTFinal draft for CWA/MES:1998 1998-11-18 English version Information technology – Multilingual European Subsets in ISO/IEC 10646-1 Technologies de l’information – Informationstechnologie – Jeux partiels européens multilingues Mehrsprachige europäische Untermengen dans l’ISO/CEI 10646-1 in ISO/IEC 10646-1 This CEN Workshop Agreement has been drafted and approved by a Workshop of representatives of interested parties, whose names and affiliations can be obtained from the CEN/ISSS Secretariat. The formal process followed by the Workshop in the development of this Workshop Agreement has been endorsed by the National Members of CEN, but neither the National Members of CEN nor the CEN Central Secretariat can be held accountable for the technical content of this CEN Workshop Agreement or for possible conflicts with standards or legislation. This CEN Workshop Agreement can in no way be held as being an official standard developed by CEN and its Members. This CEN Workshop Agreement is publicly available, as a reference document, from the CEN Members National Standard Bodies. CEN Members are the National Standards Bodies of Austria, Belgium, the Czech Republic, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland and the United Kingdom. CEN EUROPEAN COMMITTEE FOR STANDARDIZATION COMITÉ EUROPÉEN DE NORMALISATION EUROPÄISCHES KOMITEE FÜR NORMUNG Central Secretariat: rue de Stassart 36, B-1050 Brussels © CEN 1998 All rights of exploitation in any form and by any means reserved worldwide for CEN national Members Ref.No. CWA/MES:1998 E Information technology – Page 2 Multilingual European Subsets in ISO/IEC 10646-1 Final Draft for CWA/MES:1998 Contents Foreword 3 Introduction 4 1. -
Representing Myanmar in Unicode Details and Examples Version 3
Representing Myanmar in Unicode Details and Examples Version 3 Martin Hosken1 Table of Contents Introduction................................................................................................................................................2 Unicode 5.1 Model.....................................................................................................................................4 Advanced Issues.......................................................................................................................................11 Languages.................................................................................................................................................14 Burmese....................................................................................................................................................15 Old Burmese.............................................................................................................................................18 Sanskrit/Pali..............................................................................................................................................20 Mon...........................................................................................................................................................22 Sgaw Karen...............................................................................................................................................24 Western Pwo Karen..................................................................................................................................26 -
CEN WORKSHOP AGREEMENT CWA 13873:2000 Multilingual
CEN WORKSHOP AGREEMENT CWA 13873:2000 2000-03-01 English version Information technology – Multilingual European Subsets in ISO/IEC 10646-1 Technologies de l’information – Informationstechnologie – Jeux partiels européens multilingues Mehrsprachige europäische Untermengen dans l’ISO/CEI 10646-1 in ISO/IEC 10646-1 This CEN Workshop Agreement has been drafted and approved by a Workshop of representatives of interested parties, whose names and affiliations can be obtained from the CEN/ISSS Secretariat. The formal process followed by the Workshop in the development of this Workshop Agreement has been endorsed by the National Members of CEN, but neither the National Members of CEN nor the CEN Central Secretariat can be held accountable for the technical content of this CEN Workshop Agreement or for possible conflicts with standards or legislation. This CEN Workshop Agreement can in no way be held as being an official standard developed by CEN and its Members. This CEN Workshop Agreement is publicly available, as a reference document, from the CEN Members National Standard Bodies. CEN Members are the National Standards Bodies of Austria, Belgium, the Czech Republic, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland and the United Kingdom. CEN EUROPEAN COMMITTEE FOR STANDARDIZATION COMITÉ EUROPÉEN DE NORMALISATION EUROPÄISCHES KOMITEE FÜR NORMUNG Central Secretariat: rue de Stassart 36, B-1050 Brussels © CEN 2000 All rights of exploitation in any form and by any means reserved worldwide for CEN national Members Ref.No. CWA 13873:2000 E Page 2 CWA 13873:2000 Contents Foreword 3 Introduction 4 1. Scope 5 2. -
UTF-8 from Wikipedia, the Free Encyclopedia
UTF-8 From Wikipedia, the free encyclopedia UTF-8 is a character encoding capable of encoding all possible characters, or code points, defined by Unicode and originally designed by Ken Thompson and Rob Pike.[1] The encoding is variable-length and uses 8-bit code units. It was designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in the alternative UTF-16 and UTF-32 encodings. The name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8- bit.[2] UTF-8 is the dominant character encoding for the World Wide Web, accounting for 89.1% of all Web pages in May 2017 (the most popular East Asian encodings, Shift JIS and GB 2312, have 0.9% and 0.7% respectively).[4][5][3] The Internet Mail Consortium (IMC) recommended that all e-mail programs be able to display and create mail using UTF-8,[6] and the W3C recommends UTF-8 as the default encoding in XML and HTML.[7] UTF-8 encodes each of the 1,112,064[8] valid code points in Unicode using one to four 8-bit bytes.[9] Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well. Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as '/' in filenames, '\' in escape sequences, and '%' in printf. -
Character Properties 4
The Unicode® Standard Version 14.0 – Core Specification To learn about the latest version of the Unicode Standard, see https://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2021 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at https://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see https://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 14.0. Includes index. ISBN 978-1-936213-29-0 (https://www.unicode.org/versions/Unicode14.0.0/) 1. -
About the Code Charts 24
The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1. -
Unicode Character Properties
Unicode character properties Document #: P1628R0 Date: 2019-06-17 Project: Programming Language C++ Audience: SG-16, LEWG Reply-to: Corentin Jabot <[email protected]> 1 Abstract We propose an API to query the properties of Unicode characters as specified by the Unicode Standard and several Unicode Technical Reports. 2 Motivation This API can be used as a foundation for various Unicode algorithms and Unicode facilities such as Unicode-aware regular expressions. Being able to query the properties of Unicode characters is important for any application hoping to correctly handle any textual content, including compilers and parsers, text editors, graphical applications, databases, messaging applications, etc. static_assert(uni::cp_script('C') == uni::script::latin); static_assert(uni::cp_block(U'[ ') == uni::block::misc_pictographs); static_assert(!uni::cp_is<uni::property::xid_start>('1')); static_assert(uni::cp_is<uni::property::xid_continue>('1')); static_assert(uni::cp_age(U'[ ') == uni::version::v10_0); static_assert(uni::cp_is<uni::property::alphabetic>(U'ß')); static_assert(uni::cp_category(U'∩') == uni::category::sm); static_assert(uni::cp_is<uni::category::lowercase_letter>('a')); static_assert(uni::cp_is<uni::category::letter>('a')); 3 Design Consideration 3.1 constexpr An important design decision of this proposal is that it is fully constexpr. Notably, the presented design allows an implementation to only link the Unicode tables that are actually used by a program. This can reduce considerably the size requirements of an Unicode-aware executable as most applications often depend on a small subset of the Unicode properties. While the complete 1 Unicode database has a substantial memory footprint, developers should not pay for the table they don’t use. It also ensures that developers can enforce a specific version of the Unicode Database at compile time and get a consistent and predictable run-time behavior. -
Chapter 6, Writing Systems and Punctuation
The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.