XML & JSON: interchangeability and case studies Part 1: from text to XML/JSON
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato
Consiglio Nazionale delle Ricerche Istituto di Scienze e Tecnologie della Cognizione Catania Semantic web
• Classic web enhancement! • Information encoding! • Information ambiguity! • Information transfer systems! • Searching, maintaining and preserving reliable data!
• Methods for data use and exchange!
XML and JSON !
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML and JSON
• Created for the exchange between client and server!
• Readable!
• Hierarchical !
• Many tools that read and use them !
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML and JSON: differences
XML! JSON!
• Longer! • Shorter! • Need a parser to be interpreted ! • No parser to be interpreted ! • No data type “array”! • Native data type “array”!
XML and JSON! or! XML vs JSON!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
• Communication!
• Character encoding!
• Text storing!
• Text transmission!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
Definitions! • String!
• Repertoire of characters!
• Charset!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
• Morse!
• Enigma!
• ASCII!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
• Morse!
• Enigma!
• ASCII!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
• Morse!
• Enigma!
• ASCII!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
ASCII!
01001000 01100101 01101100 01101100 01101111 00100000 01010111 01101111 01110010 01101100 01100100!
48 65 6C 6C 6F 20 77 6F 72 6C 64! Hello world!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
ASCII!
• From 128 to 256 (from 7 bit to 8 bit)!
• Charsets from IBM, HP, Apple, Microsoft!
• From code page to ISO!
• ISO vs ANSI !
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
UNICODE!
• 143.859 characters!
• Covering 154 modern and historic scripts!
• Character encoding:! • UTF-32! • UTF-16! • UTF-8!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
UTF-16!
• 2-4 bytes!
• 3 schemas! • UTF-16! • UTF-16LE (Little Endian)! • UTF-16BE (Big Endian)!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
UTF-8!
• 1-4 bytes!
• 1.112.064 valid character code points in Unicode! • 1 byte: Standard ASCII! • 2 bytes: Arabic, Hebrew, most European scripts! • 3 bytes: BMP (Basic Multilingual Plane)! • 4 bytes: All Unicode characters!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
Mojibake!
The UTF-8-encoded Japanese Wikipedia article for Mojibake as displayed if interpreted as Windows-1252 encoding!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding
UTF-8!
• The most common encoding for the World Wide Web!
• Accounting for 97% of all web pages!
• Up to 100% for some languages!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
FAIR principles!
Findable!
Accessible!
Interoperable!
Reusable!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
CSV!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
CSV!
CSV Advantages! CSV Disdvantages! • CSV is human readable and easy to edit • CSV allows to move most basic data manually! only. Complex configurations cannot be • CSV is simple to implement and parse! imported and exported this way! • CSV is processed by almost all existing • There is no distinction between text and applications! numeric values! • CSV provides a straightforward • No standard way to represent binary information schema! data! • CSV is faster to handle! • Problems with importing CSV into SQL • CSV is smaller in size! (no distinction between NULL and quotes)! • CSV is considered to be standard • Poor support of special characters! format! • No standard way to represent control • CSV is compact. For XML you start tag characters! and end tag for each column in each row. • Lack of universal standard! In CSV you write the column headers only • Feld data may also contain commas or once.! even embedded line-breaks! • CSV is easy to generate!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
ISO/OSI!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
ISO/OSI!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
ISO/OSI!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
HTML - The Web 1.0!
• www!
• Tim Berners-Lee!
• SGML!
• Netscape vs Microsoft !
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
HTML - The Web 1.0!
• Programming language!
• Standard markup language!
• Web browser!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
HTML - The Web 1.0!
• Syntax!
• Semantic!
• Representation!
• Behaviour!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
HTML - The Web 1.0!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
HTML - The Web 1.0!
EUPORIA web page source!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
XML - The Web 1.1!
• eXtensible Markup Language !
• Specification for the definition of markup languages!
• World Wide Web Committee (W3C)!
• HTML as an XML application -> XHTML!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
XML - The Web 1.1!
• Integrity of data in any XML document!
• Technology to interoperate with any platform!
• Technology to interoperate with any platform!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
The way to JSON: Java, .NET e AJAX !
• Sun and Microsoft!
• Java! • object-oriented programming languages ! • “write once run anywhere”!
• .NET, C#! • XML to solve the data interoperability puzzle!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
The way to JSON: Java, .NET e AJAX !
• AJAX: “Asynchronous JavaScript and XML”!
• Communications in background!
• Single-page Application (SPA)!
• JavaScript for everyone!
• Web 2.0!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
JSON!
• HTML document containing some JavaScript!
• Interoperability across all browsers!
• Interchange data between arbitrary language!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange
JSON!
“XML is the most fully developed means of getting data in and out of an AJAX client, but there’s no reason you couldn’t accomplish the same effects using a technology like JavaScript Object Notation or any similar means of structuring data.”!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
XML!
• eXtensible Markup Language!
• Store and transport data!
• Human- and machine-readable!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
XML vs HTML!
• XML was designed to carry!
• HTML was designed to display data!
• XML tags are not predefined!
• HTML tags are predefined!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
XML!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
XML syntax rules! • Documents must have a root element!
• Prolog is optional!
• All elements must have a closing tag!
• Properly nested!
• Attribute values must always be quoted!
• Well formed!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
XML elements and attributes! • An element can contain:! • text! • attributes! • other elements! • or a mix of the above!
• An attribute must be quoted!
• Avoid attributes (if unnecessary):! • attributes cannot contain multiple values (elements can)! • attributes cannot contain tree structures (elements can)! • attributes are not easily expandable (for future changes)!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
XML elements and attributes!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
XML and XSLT!
• XSLT is style sheet language for XML!
• XSLT is far more sophisticated than CSS!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
XML and XSLT!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
XML schema!
• Describes the structure of an XML document!
• “Well Formed”!
• “Valid”!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
XML example: TEI!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
JSON!
• JSON: JavaScript Object Notation!
• JSON is a syntax for storing and exchanging data!
• JSON is text, written with JavaScript object notation!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
JSON syntax!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
JSON schema!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
JSON example!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
JSON vs XML! • JSON is like XML because! • Both JSON and XML are "self describing" (human readable)! • Both JSON and XML are hierarchical (values within values)! • Both JSON and XML can be parsed and used by lots of programming languages!
• JSON is unike XML because! • JSON doesn't use end tag! • JSON is shorter! • JSON is quicker to read and write! • JSON can use arrays!
• XML is much more difficult to parse than JSON!
• JSON is parsed into a ready-to-use JavaScript object!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
JSON vs XML!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
JSON vs XML! • XML has a schema outside!
• XML more powerful schema!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
JSON and XML!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON
Grazie!!
Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021