XML & JSON: Interchangeability and Case Studies

XML & JSON: interchangeability and case studies Part 1: from text to XML/JSON Salvatore Cristofaro, Pietro Sichera and Daria Spampinato Consiglio Nazionale delle Ricerche Istituto di Scienze e Tecnologie della Cognizione Catania Semantic web • Classic web enhancement! • Information encoding! • Information ambiguity! • Information transfer systems! • Searching, maintaining and preserving reliable data! • Methods for data use and exchange! XML and JSON ! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML and JSON • Created for the exchange between client and server! • Readable! • Hierarchical ! • Many tools that read and use them ! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML and JSON: differences XML! JSON! • Longer! • Shorter! • Need a parser to be interpreted ! • No parser to be interpreted ! • No data type “array”! • Native data type “array”! XML and JSON! or! XML vs JSON! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding • Communication! • Character encoding! • Text storing! • Text transmission! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding Definitions! • String! • Repertoire of characters! • Charset! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding • Morse! • Enigma! • ASCII! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding • Morse! • Enigma! • ASCII! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding • Morse! • Enigma! • ASCII! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding ASCII! 01001000 01100101 01101100 01101100 01101111 00100000 01010111 01101111 01110010 01101100 01100100! 48 65 6C 6C 6F 20 77 6F 72 6C 64! Hello world! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding ASCII! • From 128 to 256 (from 7 bit to 8 bit)! • Charsets from IBM, HP, Apple, Microsoft! • From code page to ISO! • ISO vs ANSI ! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding UNICODE! • 143.859 characters! • Covering 154 modern and historic scripts! • Character encoding:! • UTF-32! • UTF-16! • UTF-8! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding UTF-16! • 2-4 bytes! • 3 schemas! • UTF-16! • UTF-16LE (Little Endian)! • UTF-16BE (Big Endian)! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding UTF-8! • 1-4 bytes! • 1.112.064 valid character code points in Unicode! • 1 byte: Standard ASCII! • 2 bytes: Arabic, Hebrew, most European scripts! • 3 bytes: BMP (Basic Multilingual Plane)! • 4 bytes: All Unicode characters! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding Mojibake! The UTF-8-encoded Japanese Wikipedia article for Mojibake as displayed if interpreted as Windows-1252 encoding! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding UTF-8! • The most common encoding for the World Wide Web! • Accounting for 97% of all web pages! • Up to 100% for some languages! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange FAIR principles! Findable! Accessible! Interoperable! Reusable! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange CSV! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange CSV! CSV Advantages! CSV Disdvantages! • CSV is human readable and easy to edit • CSV allows to move most basic data manually! only. Complex configurations cannot be • CSV is simple to implement and parse! imported and exported this way! • CSV is processed by almost all existing • There is no distinction between text and applications! numeric values! • CSV provides a straightforward • No standard way to represent binary information schema! data! • CSV is faster to handle! • Problems with importing CSV into SQL • CSV is smaller in size! (no distinction between NULL and quotes)! • CSV is considered to be standard • Poor support of special characters! format! • No standard way to represent control • CSV is compact. For XML you start tag characters! and end tag for each column in each row. • Lack of universal standard! In CSV you write the column headers only • Feld data may also contain commas or once.! even embedded line-breaks! • CSV is easy to generate! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange ISO/OSI! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange ISO/OSI! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange ISO/OSI! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange HTML - The Web 1.0! • www! • Tim Berners-Lee! • SGML! • Netscape vs Microsoft ! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange HTML - The Web 1.0! • Programming language! • Standard markup language! • Web browser! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange HTML - The Web 1.0! • Syntax! • Semantic! • Representation! • Behaviour! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange HTML - The Web 1.0! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange HTML - The Web 1.0! EUPORIA web page source! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange XML - The Web 1.1! • eXtensible Markup Language ! • Specification for the definition of markup languages! • World Wide Web Committee (W3C)! • HTML as an XML application -> XHTML! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange XML - The Web 1.1! • Integrity of data in any XML document! • Technology to interoperate with any platform! • Technology to interoperate with any platform! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange The way to JSON: Java, .NET e AJAX ! • Sun and Microsoft! • Java! • object-oriented programming languages ! • “write once run anywhere”! • .NET, C#! • XML to solve the data interoperability puzzle! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange The way to JSON: Java, .NET e AJAX ! • AJAX: “Asynchronous JavaScript and XML”! • Communications in background! • Single-page Application (SPA)! • JavaScript for everyone! • Web 2.0! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange JSON! • HTML document containing some JavaScript! • Interoperability across all browsers! • Interchange data between arbitrary language! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange JSON! “XML is the most fully developed means of getting data in and out of an AJAX client, but there’s no reason you couldn’t accomplish the same effects using a technology like JavaScript Object Notation or any similar means of structuring data.”! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML! • eXtensible Markup Language! • Store and transport data! • Human- and machine-readable! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML vs HTML! • XML was designed to carry! • HTML was designed to display data! • XML tags are not predefined! • HTML tags are predefined! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML syntax rules! • Documents must have a root element! • Prolog is optional! • All elements must have a closing tag! • Properly nested! • Attribute values must always be quoted! • Well formed! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML elements and attributes! • An element can contain:! • text! • attributes! • other elements! • or a mix of the above! • An attribute must be quoted! • Avoid attributes (if unnecessary):! • attributes cannot contain multiple values (elements can)! • attributes cannot contain tree structures (elements can)! • attributes are not easily expandable (for future changes)! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML elements and attributes! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML and XSLT! • XSLT is style sheet language for XML! • XSLT is far more sophisticated than CSS! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML and XSLT! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML schema! • Describes the structure of an XML document! • “Well Formed”! • “Valid”! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON XML example: TEI! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON JSON! • JSON: JavaScript Object Notation! • JSON is a syntax for storing and exchanging data! • JSON is text, written with JavaScript object notation! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON JSON syntax! Salvatore Cristofaro, Pietro Sichera and Daria Spampinato

XML & JSON: Interchangeability and Case Studies

Hieroglyphs for the Information Age: Images As a Replacement for Characters for Languages Not Written in the Latin-1 Alphabet Akira Hasegawa

PCL PC-8, Code Page 437 Page 1 of 5 PCL PC-8, Code Page 437

Unicode and Code Page Support

IBM Data Conversion Under Websphere MQ

Plain Text & Character Encoding

Bitmap Fonts

Unicode and Code Page Support

Windows NLS Considerations Version 2.1

UTF-8 from Wikipedia, the Free Encyclopedia

Junk Characters in Bb Annotate for Several Non-English Languages

Character Sets Reference Manual for Line Matrix Printers

Unicode Identifiers and Reflection