XML & JSON: Interchangeability and Case Studies

Total Page:16

File Type:pdf, Size:1020Kb

XML & JSON: Interchangeability and Case Studies XML & JSON: interchangeability and case studies Part 1: from text to XML/JSON Salvatore Cristofaro, Pietro Sichera and Daria Spampinato Consiglio Nazionale delle Ricerche Istituto di Scienze e Tecnologie della Cognizione Catania Semantic web • Classic web enhancement! • Information encoding! • Information ambiguity! • Information transfer systems! • Searching, maintaining and preserving reliable data! • Methods for data use and exchange! XML and JSON ! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML and JSON • Created for the exchange between client and server! • Readable! • Hierarchical ! • Many tools that read and use them ! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML and JSON: differences XML! JSON! • Longer! • Shorter! • Need a parser to be interpreted ! • No parser to be interpreted ! • No data type “array”! • Native data type “array”! XML and JSON! or! XML vs JSON! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding • Communication! • Character encoding! • Text storing! • Text transmission! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding Definitions! • String! • Repertoire of characters! • Charset! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding • Morse! • Enigma! • ASCII! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding • Morse! • Enigma! • ASCII! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding • Morse! • Enigma! • ASCII! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding ASCII! 01001000 01100101 01101100 01101100 01101111 00100000 01010111 01101111 01110010 01101100 01100100! 48 65 6C 6C 6F 20 77 6F 72 6C 64! Hello world! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding ASCII! • From 128 to 256 (from 7 bit to 8 bit)! • Charsets from IBM, HP, Apple, Microsoft! • From code page to ISO! • ISO vs ANSI ! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding UNICODE! • 143.859 characters! • Covering 154 modern and historic scripts! • Character encoding:! • UTF-32! • UTF-16! • UTF-8! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding UTF-16! • 2-4 bytes! • 3 schemas! • UTF-16! • UTF-16LE (Little Endian)! • UTF-16BE (Big Endian)! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding UTF-8! • 1-4 bytes! • 1.112.064 valid character code points in Unicode! • 1 byte: Standard ASCII! • 2 bytes: Arabic, Hebrew, most European scripts! • 3 bytes: BMP (Basic Multilingual Plane)! • 4 bytes: All Unicode characters! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding Mojibake! The UTF-8-encoded Japanese Wikipedia article for Mojibake as displayed if interpreted as Windows-1252 encoding! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding UTF-8! • The most common encoding for the World Wide Web! • Accounting for 97% of all web pages! • Up to 100% for some languages! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange FAIR principles! Findable! Accessible! Interoperable! Reusable! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange CSV! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange CSV! CSV Advantages! CSV Disdvantages! • CSV is human readable and easy to edit • CSV allows to move most basic data manually! only. Complex configurations cannot be • CSV is simple to implement and parse! imported and exported this way! • CSV is processed by almost all existing • There is no distinction between text and applications! numeric values! • CSV provides a straightforward • No standard way to represent binary information schema! data! • CSV is faster to handle! • Problems with importing CSV into SQL • CSV is smaller in size! (no distinction between NULL and quotes)! • CSV is considered to be standard • Poor support of special characters! format! • No standard way to represent control • CSV is compact. For XML you start tag characters! and end tag for each column in each row. • Lack of universal standard! In CSV you write the column headers only • Feld data may also contain commas or once.! even embedded line-breaks! • CSV is easy to generate! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange ISO/OSI! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange ISO/OSI! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange ISO/OSI! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange HTML - The Web 1.0! • www! • Tim Berners-Lee! • SGML! • Netscape vs Microsoft ! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange HTML - The Web 1.0! • Programming language! • Standard markup language! • Web browser! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange HTML - The Web 1.0! • Syntax! • Semantic! • Representation! • Behaviour! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange HTML - The Web 1.0! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange HTML - The Web 1.0! EUPORIA web page source! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange XML - The Web 1.1! • eXtensible Markup Language ! • Specification for the definition of markup languages! • World Wide Web Committee (W3C)! • HTML as an XML application -> XHTML! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange XML - The Web 1.1! • Integrity of data in any XML document! • Technology to interoperate with any platform! • Technology to interoperate with any platform! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange The way to JSON: Java, .NET e AJAX ! • Sun and Microsoft! • Java! • object-oriented programming languages ! • “write once run anywhere”! • .NET, C#! • XML to solve the data interoperability puzzle! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange The way to JSON: Java, .NET e AJAX ! • AJAX: “Asynchronous JavaScript and XML”! • Communications in background! • Single-page Application (SPA)! • JavaScript for everyone! • Web 2.0! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange JSON! • HTML document containing some JavaScript! • Interoperability across all browsers! • Interchange data between arbitrary language! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange JSON! “XML is the most fully developed means of getting data in and out of an AJAX client, but there’s no reason you couldn’t accomplish the same effects using a technology like JavaScript Object Notation or any similar means of structuring data.”! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML! • eXtensible Markup Language! • Store and transport data! • Human- and machine-readable! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML vs HTML! • XML was designed to carry! • HTML was designed to display data! • XML tags are not predefined! • HTML tags are predefined! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML syntax rules! • Documents must have a root element! • Prolog is optional! • All elements must have a closing tag! • Properly nested! • Attribute values must always be quoted! • Well formed! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML elements and attributes! • An element can contain:! • text! • attributes! • other elements! • or a mix of the above! • An attribute must be quoted! • Avoid attributes (if unnecessary):! • attributes cannot contain multiple values (elements can)! • attributes cannot contain tree structures (elements can)! • attributes are not easily expandable (for future changes)! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML elements and attributes! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML and XSLT! • XSLT is style sheet language for XML! • XSLT is far more sophisticated than CSS! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML and XSLT! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML schema! • Describes the structure of an XML document! • “Well Formed”! • “Valid”! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML example: TEI! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON JSON! • JSON: JavaScript Object Notation! • JSON is a syntax for storing and exchanging data! • JSON is text, written with JavaScript object notation! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON JSON syntax! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato
Recommended publications
  • Hieroglyphs for the Information Age: Images As a Replacement for Characters for Languages Not Written in the Latin-1 Alphabet Akira Hasegawa
    Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 5-1-1999 Hieroglyphs for the information age: Images as a replacement for characters for languages not written in the Latin-1 alphabet Akira Hasegawa Follow this and additional works at: http://scholarworks.rit.edu/theses Recommended Citation Hasegawa, Akira, "Hieroglyphs for the information age: Images as a replacement for characters for languages not written in the Latin-1 alphabet" (1999). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Hieroglyphs for the Information Age: Images as a Replacement for Characters for Languages not Written in the Latin- 1 Alphabet by Akira Hasegawa A thesis project submitted in partial fulfillment of the requirements for the degree of Master of Science in the School of Printing Management and Sciences in the College of Imaging Arts and Sciences of the Rochester Institute ofTechnology May, 1999 Thesis Advisor: Professor Frank Romano School of Printing Management and Sciences Rochester Institute ofTechnology Rochester, New York Certificate ofApproval Master's Thesis This is to certify that the Master's Thesis of Akira Hasegawa With a major in Graphic Arts Publishing has been approved by the Thesis Committee as satisfactory for the thesis requirement for the Master ofScience degree at the convocation of May 1999 Thesis Committee: Frank Romano Thesis Advisor Marie Freckleton Gr:lduate Program Coordinator C.
    [Show full text]
  • PCL PC-8, Code Page 437 Page 1 of 5 PCL PC-8, Code Page 437
    PCL PC-8, Code Page 437 Page 1 of 5 PCL PC-8, Code Page 437 PCL Symbol Set: 10U Unicode glyph correspondence tables. Contact:[email protected] http://pcl.to -- -- -- -- $90 U00C9 Ê Uppercase e acute $21 U0021 Ë Exclamation $91 U00E6 Ì Lowercase ae diphthong $22 U0022 Í Neutral double quote $92 U00C6 Î Uppercase ae diphthong $23 U0023 Ï Number $93 U00F4 & Lowercase o circumflex $24 U0024 ' Dollar $94 U00F6 ( Lowercase o dieresis $25 U0025 ) Per cent $95 U00F2 * Lowercase o grave $26 U0026 + Ampersand $96 U00FB , Lowercase u circumflex $27 U0027 - Neutral single quote $97 U00F9 . Lowercase u grave $28 U0028 / Left parenthesis $98 U00FF 0 Lowercase y dieresis $29 U0029 1 Right parenthesis $99 U00D6 2 Uppercase o dieresis $2A U002A 3 Asterisk $9A U00DC 4 Uppercase u dieresis $2B U002B 5 Plus $9B U00A2 6 Cent sign $2C U002C 7 Comma, decimal separator $9C U00A3 8 Pound sterling $2D U002D 9 Hyphen $9D U00A5 : Yen sign $2E U002E ; Period, full stop $9E U20A7 < Pesetas $2F U002F = Solidus, slash $9F U0192 > Florin sign $30 U0030 ? Numeral zero $A0 U00E1 ê Lowercase a acute $31 U0031 A Numeral one $A1 U00ED B Lowercase i acute $32 U0032 C Numeral two $A2 U00F3 D Lowercase o acute $33 U0033 E Numeral three $A3 U00FA F Lowercase u acute $34 U0034 G Numeral four $A4 U00F1 H Lowercase n tilde $35 U0035 I Numeral five $A5 U00D1 J Uppercase n tilde $36 U0036 K Numeral six $A6 U00AA L Female ordinal (a) http://www.pclviewer.com (c) RedTitan Technology 2005 PCL PC-8, Code Page 437 Page 2 of 5 $37 U0037 M Numeral seven $A7 U00BA N Male ordinal (o) $38 U0038
    [Show full text]
  • Unicode and Code Page Support
    Natural for Mainframes Unicode and Code Page Support Version 4.2.6 for Mainframes October 2009 This document applies to Natural Version 4.2.6 for Mainframes and to all subsequent releases. Specifications contained herein are subject to change and these changes will be reported in subsequent release notes or new editions. Copyright © Software AG 1979-2009. All rights reserved. The name Software AG, webMethods and all Software AG product names are either trademarks or registered trademarks of Software AG and/or Software AG USA, Inc. Other company and product names mentioned herein may be trademarks of their respective owners. Table of Contents 1 Unicode and Code Page Support .................................................................................... 1 2 Introduction ..................................................................................................................... 3 About Code Pages and Unicode ................................................................................ 4 About Unicode and Code Page Support in Natural .................................................. 5 ICU on Mainframe Platforms ..................................................................................... 6 3 Unicode and Code Page Support in the Natural Programming Language .................... 7 Natural Data Format U for Unicode-Based Data ....................................................... 8 Statements .................................................................................................................. 9 Logical
    [Show full text]
  • IBM Data Conversion Under Websphere MQ
    IBM WebSphere MQ Data Conversion Under WebSphere MQ Table of Contents .................................................................................................................................................... 3 .................................................................................................................................................... 3 Int roduction............................................................................................................................... 4 Ac ronyms and terms used in Data Conversion........................................................................ 5 T he Pieces in the Data Conversion Puzzle............................................................................... 7 Coded Character Set Identifier (CCSID)........................................................................................ 7 Encoding .............................................................................................................................................. 7 What Gets Converted, and How............................................................................................... 9 The Message Descriptor.................................................................................................................... 9 The User portion of the message..................................................................................................... 10 Common Procedures when doing the MQPUT................................................................. 10 The message
    [Show full text]
  • Plain Text & Character Encoding
    Journal of eScience Librarianship Volume 10 Issue 3 Data Curation in Practice Article 12 2021-08-11 Plain Text & Character Encoding: A Primer for Data Curators Seth Erickson Pennsylvania State University Let us know how access to this document benefits ou.y Follow this and additional works at: https://escholarship.umassmed.edu/jeslib Part of the Scholarly Communication Commons, and the Scholarly Publishing Commons Repository Citation Erickson S. Plain Text & Character Encoding: A Primer for Data Curators. Journal of eScience Librarianship 2021;10(3): e1211. https://doi.org/10.7191/jeslib.2021.1211. Retrieved from https://escholarship.umassmed.edu/jeslib/vol10/iss3/12 Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 License. This material is brought to you by eScholarship@UMMS. It has been accepted for inclusion in Journal of eScience Librarianship by an authorized administrator of eScholarship@UMMS. For more information, please contact [email protected]. ISSN 2161-3974 JeSLIB 2021; 10(3): e1211 https://doi.org/10.7191/jeslib.2021.1211 Full-Length Paper Plain Text & Character Encoding: A Primer for Data Curators Seth Erickson The Pennsylvania State University, University Park, PA, USA Abstract Plain text data consists of a sequence of encoded characters or “code points” from a given standard such as the Unicode Standard. Some of the most common file formats for digital data used in eScience (CSV, XML, and JSON, for example) are built atop plain text standards. Plain text representations of digital data are often preferred because plain text formats are relatively stable, and they facilitate reuse and interoperability.
    [Show full text]
  • Bitmap Fonts
    .com Bitmap Fonts 1 .com Contents Introduction .................................................................................................................................................. 3 Writing Code to Write Code.......................................................................................................................... 4 Measuring Your Grid ..................................................................................................................................... 5 Converting an Image with PHP ..................................................................................................................... 6 Step 1: Load the Image ............................................................................................................................. 6 Step 2: Scan the Image .............................................................................................................................. 7 Step 3: Save the Header File ..................................................................................................................... 8 The 1602 Character Set ............................................................................................................................... 10 The 1602 Character Map ............................................................................................................................ 11 Converting the Image to Code .................................................................................................................... 12 Conclusion
    [Show full text]
  • Unicode and Code Page Support
    Natural Unicode and Code Page Support Version 8.2.4 November 2016 This document applies to Natural Version 8.2.4. Specifications contained herein are subject to change and these changes will be reported in subsequent release notes or new editions. Copyright © 1979-2016 Software AG, Darmstadt, Germany and/or Software AG USA, Inc., Reston, VA, USA, and/or its subsidiaries and/or its affiliates and/or their licensors. The name Software AG and all Software AG product names are either trademarks or registered trademarks of Software AG and/or Software AG USA, Inc. and/or its subsidiaries and/or its affiliates and/or their licensors. Other company and product names mentioned herein may be trademarks of their respective owners. Detailed information on trademarks and patents owned by Software AG and/or its subsidiaries is located at http://softwareag.com/licenses. Use of this software is subject to adherence to Software AG's licensing conditions and terms. These terms are part of the product documentation, located at http://softwareag.com/licenses/ and/or in the root installation directory of the licensed product(s). This software may include portions of third-party products. For third-party copyright notices, license terms, additional rights or re- strictions, please refer to "License Texts, Copyright Notices and Disclaimers of Third-Party Products". For certain specific third-party license restrictions, please refer to section E of the Legal Notices available under "License Terms and Conditions for Use of Software AG Products / Copyright and Trademark Notices of Software AG Products". These documents are part of the product documentation, located at http://softwareag.com/licenses and/or in the root installation directory of the licensed product(s).
    [Show full text]
  • Windows NLS Considerations Version 2.1
    Windows NLS Considerations version 2.1 Radoslav Rusinov [email protected] Windows NLS Considerations Contents 1. Introduction ............................................................................................................................................... 3 1.1. Windows and Code Pages .................................................................................................................... 3 1.2. CharacterSet ........................................................................................................................................ 3 1.3. Encoding Scheme ................................................................................................................................ 3 1.4. Fonts ................................................................................................................................................... 4 1.5. So Why Are There Different Charactersets? ........................................................................................ 4 1.6. What are the Difference Between 7 bit, 8 bit and Unicode Charactersets? ........................................... 4 2. NLS_LANG .............................................................................................................................................. 4 2.1. Setting the Character Set in NLS_LANG ............................................................................................ 4 2.2. Where is the Character Conversion Done? .........................................................................................
    [Show full text]
  • UTF-8 from Wikipedia, the Free Encyclopedia
    UTF-8 From Wikipedia, the free encyclopedia UTF-8 is a character encoding capable of encoding all possible characters, or code points, defined by Unicode and originally designed by Ken Thompson and Rob Pike.[1] The encoding is variable-length and uses 8-bit code units. It was designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in the alternative UTF-16 and UTF-32 encodings. The name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8- bit.[2] UTF-8 is the dominant character encoding for the World Wide Web, accounting for 89.1% of all Web pages in May 2017 (the most popular East Asian encodings, Shift JIS and GB 2312, have 0.9% and 0.7% respectively).[4][5][3] The Internet Mail Consortium (IMC) recommended that all e-mail programs be able to display and create mail using UTF-8,[6] and the W3C recommends UTF-8 as the default encoding in XML and HTML.[7] UTF-8 encodes each of the 1,112,064[8] valid code points in Unicode using one to four 8-bit bytes.[9] Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well. Since ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, UTF-8 is safe to use within most programming and document languages that interpret certain ASCII characters in a special way, such as '/' in filenames, '\' in escape sequences, and '%' in printf.
    [Show full text]
  • Junk Characters in Bb Annotate for Several Non-English Languages
    Junk Characters in Bb Annotate for Several non-English Languages Date Published: Jul 31,2020 Category: Planned_First_Fix_Release:Learn_9_1_3900_0_Release,SaaS_v3800_15_0; Product:Grade_Center_Learn,Language_Packs_Learn; Version:Learn_9_1_Q4_2019,Learn_9_1_Q2_2019,SaaS Article No.: 000060296 Product: Blackboard Learn Release: 9.1;SaaS Service Pack(s): Learn 9.1 Q4 2019 (3800.0.0), Learn 9.1 Q2 2019 (3700.0.0), SaaS Description: Incorrect or non-textual font symbols such as §, © and ¶ appeared in the Blackboard Annotate User Interface when using several non-English Language Packs, including Arabic, Spanish, Korean, and Japanese. Steps to Replicate: Prerequisite: The Learn environment has converted to Blackboard Annotate. 1. Log into Blackboard Learn as System Administrator 2. Set the Language Pack to a non-English language, such as Arabic, Spanish, Korean, or Japanese 3. Log in as Instructor 4. Navigate to a Course with Assignments 5. Grade any assignment using Blackboard Annotate Expected Behavior: The user interface displays proper characters for the language chosen. Observed Behavior: Symbols such as §, © and ¶, or characters from other languages appear. Symptoms: Incorrect characters appear in the Blackboard Annotate User Interface. Cause: Characters consist of one or more binary bytes indicating a location in a 'codepage' for a specific character encoding, such as CP252 for Arabic. Information regarding the encoding used needs to be sent by the server to the browser for it to use the correct codepage. If an incorrect codepage is used to look up the characters to be displayed, unintelligble characters known as "Mojibake" will appear because the locations in one codepage will not will not necessarily contain the same characters as another.
    [Show full text]
  • Character Sets Reference Manual for Line Matrix Printers
    R Character Sets Reference Manual for Line Matrix Printers Character Sets Reference Manual for Line Matrix Printers R P/N 164308–001, Rev B Printronix, Inc. makes no representations or warranties of any kind regarding this material, including, but not limited to, implied warranties of merchantability and fitness for a particular purpose. Printronix, Inc. shall not be held responsible for errors contained herein or any omissions from this material or for any damages, whether direct, indirect, incidental or consequential, in connection with the furnishing, distribution, performance or use of this material. The information in this manual is subject to change without notice. This document contains proprietary information protected by copyright. No part of this document may be reproduced, copied, translated or incorporated in any other material in any form or by any means, whether manual, graphic, electronic, mechanical or otherwise, without the prior written consent of Printronix, Inc. All rights reserved. TRADEMARK ACKNOWLEDGMENTS Printronix, LinePrinter Plus, PGL and IGP are registered trademarks of Printronix, Inc. DEC is a registered trademark of Digital Equipment Corporation. Epson is a registered trademark of Seiko Epson. IBM is a registered trademark of Internation Business Machines Corporation. Proprinter is a registered trademark of IBM. Scalable type outlines are licensed from Agfa Corporation. Agfa is a registered trademark of Agfa Division, Miles Incorporated (Agfa). CG, Garth Graphic, Intellifont, and Type Director are registered trademarks of Agfa Corporation, and Shannon and CG Triumvirate are trademarks of Agfa Corporation. CG Bodoni, CG Century Schoolbook, CG Goudy Old Style, CG Melliza, Microstyle, CG Omega, and CG Palacio are products of Agfa Corporation.
    [Show full text]
  • Unicode Identifiers and Reflection
    Unicode Identifiers And Reflection D1953R0 Reply to: [email protected] ​ Audience: SG-7, SG-15 Abstract SG-16 members are looking at extending the basic character set to support Unicode Identifiers. SG-7 is designing tools to convert identifiers to string (as well as the reverse). Therefore it will be necessary to be able to reflect (and reifere) on identifiers containing characters outside of the basic character sets. We explore solutions Unicode Identifiers Extending the basic character set is an area of ongoing research, but the general direction is: ● Based on TR31 ​ ● Specified Normalization of identifiers at compile time (more likely NFC) - to ensure consistent behavior (and mangling) across translation units and implementations. ● Limited to (assumed) UTF-encoded files, because no one wants mojibake in their identifiers The general motivation is not to encourage Unicode characters in identifiers but to ensure a consistent, reliable behavior across platforms. However, the goal of that paper is not specified how Unicode identifiers should work but rather to open a discussion as to how they should be reflected upon. C++ Text Model Primer For people not familiar with the work of SG-16, here is briefly how C++ handle text ● Each token is converted from the “source character encoding” (which is determined by the compiler in an implementation-defined way - GCC and Clang assumes UTF-8 by default while MSVC uses UTF BOMs and user locale to determine the “source character encoding” - Both GCC and MSVC provides flags to let their user override that behavior) ● To the internal character encoding, which is not specified but implied to be a Unicode encoding ● String literals are further converted to the _execution encoding_ whose character set is a subset of the internal character set.
    [Show full text]