<<

XML & JSON: interchangeability and case studies Part 1: from text to XML/JSON

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato

Consiglio Nazionale delle Ricerche Istituto di Scienze e Tecnologie della Cognizione Catania Semantic web

• Classic web enhancement! • Information encoding! • Information ambiguity! • Information transfer systems! • Searching, maintaining and preserving reliable data!

• Methods for data use and exchange!

XML and JSON !

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML and JSON

• Created for the exchange between client and server!

• Readable!

• Hierarchical !

• Many tools that read and use them !

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML and JSON: differences

XML! JSON!

• Longer! • Shorter! • Need a parser to be interpreted ! • No parser to be interpreted ! • No data type “array”! • Native data type “array”!

XML and JSON! or! XML vs JSON!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

• Communication!

encoding!

• Text storing!

• Text transmission!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

Definitions! • String!

• Repertoire of characters!

• Charset!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

• Morse!

• Enigma!

• ASCII!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

• Morse!

• Enigma!

• ASCII!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

• Morse!

• Enigma!

• ASCII!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

ASCII!

01001000 01100101 01101100 01101100 01101111 00100000 01010111 01101111 01110010 01101100 01100100!

48 65 6C 6C 6F 20 77 6F 72 6C 64! Hello world!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

ASCII!

• From 128 to 256 (from 7 bit to 8 bit)!

• Charsets from IBM, HP, Apple, !

• From to ISO!

• ISO vs ANSI !

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

UNICODE!

• 143.859 characters!

• Covering 154 modern and historic scripts!

:! • UTF-32! • UTF-16! • UTF-8!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

UTF-16!

• 2-4 bytes!

• 3 schemas! • UTF-16! • UTF-16LE (Little Endian)! • UTF-16BE (Big Endian)!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

UTF-8!

• 1-4 bytes!

• 1.112.064 valid character code points in ! • 1 byte: Standard ASCII! • 2 bytes: , Hebrew, most European scripts! • 3 bytes: BMP (Basic Multilingual )! • 4 bytes: All Unicode characters!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

Mojibake!

The UTF-8-encoded Japanese Wikipedia article for as displayed if interpreted as Windows-1252 encoding!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Information encoding

UTF-8!

• The most common encoding for the World Wide Web!

• Accounting for 97% of all web pages!

• Up to 100% for some languages!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

FAIR principles!

Findable!

Accessible!

Interoperable!

Reusable!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

CSV!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

CSV!

CSV Advantages! CSV Disdvantages! • CSV is human readable and easy to edit • CSV allows to move most basic data manually! only. Complex configurations cannot be • CSV is simple to implement and parse! imported and exported this way! • CSV is processed by almost all existing • There is no distinction between text and applications! numeric values! • CSV provides a straightforward • No standard way to represent binary information schema! data! • CSV is faster to handle! • Problems with importing CSV into SQL • CSV is smaller in size! (no distinction between NULL and quotes)! • CSV is considered to be standard • Poor support of special characters! format! • No standard way to represent control • CSV is compact. For XML you start tag characters! and end tag for each column in each row. • Lack of universal standard! In CSV you write the column headers only • Feld data may also contain commas or once.! even embedded line-breaks! • CSV is easy to generate!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

ISO/OSI!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

ISO/OSI!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

ISO/OSI!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

HTML - The Web 1.0!

• www!

• Tim Berners-Lee!

• SGML!

• Netscape vs Microsoft !

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

HTML - The Web 1.0!

• Programming language!

• Standard markup language!

!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

HTML - The Web 1.0!

• Syntax!

• Semantic!

• Representation!

• Behaviour!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

HTML - The Web 1.0!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

HTML - The Web 1.0!

EUPORIA web page source!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

XML - The Web 1.1!

• eXtensible Markup Language !

• Specification for the definition of markup languages!

• World Wide Web Committee (W3C)!

• HTML as an XML application -> XHTML!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

XML - The Web 1.1!

• Integrity of data in any XML document!

• Technology to interoperate with any platform!

• Technology to interoperate with any platform!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

The way to JSON: Java, .NET e AJAX !

• Sun and Microsoft!

• Java! • object-oriented programming languages ! • “write once run anywhere”!

• .NET, C#! • XML to solve the data interoperability puzzle!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

The way to JSON: Java, .NET e AJAX !

• AJAX: “Asynchronous JavaScript and XML”!

• Communications in background!

• Single-page Application (SPA)!

• JavaScript for everyone!

• Web 2.0!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

JSON!

• HTML document containing some JavaScript!

• Interoperability across all browsers!

• Interchange data between arbitrary language!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 Data exchange

JSON!

“XML is the most fully developed means of getting data in and out of an AJAX client, but there’s no reason you couldn’t accomplish the same effects using a technology like JavaScript Object Notation or any similar means of structuring data.”!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

XML!

• eXtensible Markup Language!

• Store and transport data!

• Human- and machine-readable!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

XML vs HTML!

• XML was designed to carry!

• HTML was designed to display data!

• XML tags are not predefined!

• HTML tags are predefined!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

XML!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

XML syntax rules! • Documents must have a root element!

• Prolog is optional!

• All elements must have a closing tag!

• Properly nested!

• Attribute values must always be quoted!

• Well formed!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

XML elements and attributes! • An element can contain:! • text! • attributes! • other elements! • or a mix of the above!

• An attribute must be quoted!

• Avoid attributes (if unnecessary):! • attributes cannot contain multiple values (elements can)! • attributes cannot contain tree structures (elements can)! • attributes are not easily expandable (for future changes)!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

XML elements and attributes!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

XML and XSLT!

• XSLT is style sheet language for XML!

• XSLT is far more sophisticated than CSS!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

XML and XSLT!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

XML schema!

• Describes the structure of an XML document!

• “Well Formed”!

• “Valid”!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

XML example: TEI!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

JSON!

• JSON: JavaScript Object Notation!

• JSON is a syntax for storing and exchanging data!

• JSON is text, written with JavaScript object notation!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

JSON syntax!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

JSON schema!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

JSON example!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

JSON vs XML! • JSON is like XML because! • Both JSON and XML are "self describing" (human readable)! • Both JSON and XML are hierarchical (values within values)! • Both JSON and XML can be parsed and used by lots of programming languages!

• JSON is unike XML because! • JSON doesn't use end tag! • JSON is shorter! • JSON is quicker to read and write! • JSON can use arrays!

• XML is much more difficult to parse than JSON!

• JSON is parsed into a ready-to-use JavaScript object!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

JSON vs XML!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

JSON vs XML! • XML has a schema outside!

• XML more powerful schema!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

JSON and XML!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021 XML vs JSON

Grazie!!

Salvatore Cristofaro, Pietro Sichera and Daria Spampinato – 1st March 2021