Experiences with JSON and XML Transformations IBM Submission to W3C Workshop on Data and Services Integration October 20-21 2011, Bedford, MA, USA

Experiences with JSON and XML Transformations IBM Submission to W3C Workshop on Data and Services Integration October 20-21 2011, Bedford, MA, USA 21 October 2011 John Boyer, Sandy Gao, Susan Malaika, Michael Maximilien, Rich Salz, Jerome Simeon Feedback to: [email protected] Copyright IBM 1 Agenda • Why Do We Want to Transform • Friendliness vs Grotesqueness (Unfriendliness) • Typical JSON and XML Mapping Issues – Round-Trippability – Friendliness – A Use Case • Mapping Approaches – Overview of Approaches 1-4 – Recommendations • Ideas • Some Published Mappings and References • The Grand Finale Copyright IBM 2 Why Do We Want To Transform? • Trend: Javascript/Web 2.0 dominates for: – Fast parsing of message content – Programmer convenience • Trend: Multiple message formats (Atom, XML, JSON, etc) are becoming common – XML APIs and REST APIs are being extended to support JSON • Trend: Enterprises desire validation, declarative constraints and stringency for data content, but also want the benefits of Web 2.0 – Customer demand for: • Constraint and query features provided by XML • Programming ease provided by JSON Copyright IBM 3 Some Data and Metadata Standards Relational XML JSON Linked Data Metadata Data Definition Language XML Schema XSD (W3C) , JSON Schema - RDFS (RDF Schema), Ontology (ISO) Namespaces (W3C) IETF (W3C and elsewhere) Constraints Integrity Constraints in table Schematron (ISO), Relax- - definitions (ISO) NG (OASIS) Triggers Relational triggers - - RIF (W3C) Data Exchange SQL standard (ISO) defines XML is a syntax widely used JSON is a serialized XML, Turtle Serializations an XML serialization but it is for data exchange (W3C) format, there are not widely used – There is no XML representations agreed JSON serialization. of JSON too There are RDF serializations. Annotations Not part of the relational Many kinds of annotations - RDFa (W3C) can be used to model are defined for XML annotate XML schemas and for XML data Query & CRUD Data Manipulation Language XPath, XQuery (W3C) and JAQL, JSONiq SPARQL (W3C) Languages (DML), SQL, SQL/XML (ISO) others for CRUD Query & CRUD Many for various programming Various, includes some of - SPARQL Graph Store HTTP APIs languages, e.g., JDBC, ODBC the relational APIs and Protocol – for CRUD (W3C) specific APIs, e.g., XQJ Collection Table, View, Database (ISO) XML Collection Function - RDF Graphs (W3C) (W3C) Transformation & SQL (Tables to Tables) XSLT (XML to text, includes JavaScript - Other Languages XML); XForms SQL XMLTABLE (XML to relational) Currently, the role of JSON is mainly for data exchange between JavaScript Some XML communities who work on clients and servers query and transformation languages are reviewing the idea of XML languages Copyright IBM 4 supporting JSON Friendly XML • Friendly XML has multiple unique paths (multiple element and attribute names) in the XML, rather than a name value pair design • Friendliness provides – Easy XML consumption by programmers, authors and software – Straightforward transforms, queries, and indexing capabilities Friendly XML Example UnFriendly XML Example <location> <root type="object”> <city>Armonk</city> <item name=“city” value=“Armonk” /> <state>NY</state> <item name =“state” value =“NY” /> </location> </root> Copyright IBM 5 Friendly JSON • Friendly JSON does not include repeating variable names • Data types are directly associated with each variable • Friendliness provides easy data structure consumption by JavaScript programmers, authors Friendly JSON Example UnFriendly JSON Example { { "book" : { "children : "person": { [{ "title" : { "attributes" : "age": "40", { "isbn": "15115115" }, "children" : "name": "John Smith" [ "This book is ", { "emph" : "bold" }] } } }] } } } Copyright IBM 6 Typical JSON and XML Mapping Issues • Round-trip transformations are challenging – XML to JSON to XML – JSON to XML to JSON • General issues – Different character mapping for names and values in JSON and XML – Mismatch in data types between JSON and XML – Semantic infidelity • XML to JSON issues – Losing XML namespaces – Incorrectly mapping XML repeating elements – Mishandling of relative URLs • JSON to XML issues – Incorrectly mapping JSON arrays – Associating variables with data types in JSON Copyright IBM 7 Round-trip • Why is round-tripping important? – Data and data type information has to be preserved – Fidelity, to prevent data loss – Preservation of metadata, order – Maintain usability regardless of format – Some details are unnecessary, depending on use cases – May not help “Friendliness” Copyright IBM 8 Mapping Approach Overview [1.1] [1.2] Approach 1 Only round trip when needed Arbitrary Mapper Friendly with degraded friendliness JSON to XML JSON A XML Focus on Friendliness [1.4] [1.3] Approach 2 [2.1] [2.2] Arbitrary Mapper JSON to XML B XML Example: JSONx at IETF JSON Focus on Round-Tripping [2.4] [2.3] [3.1] [3.2] [3.3] Approach 3 Arbitrary Friendly [3.4] Possibly Mapper Mapper different C C’ XML to JSON XML JSON XML Focus on friendliness [4.2] Approach 4 [4.1] XML to JSON Arbitrary Mapper D JSON Focus on Round-Tripping XML [4.3] [4.4] Copyright IBM 9 Friendly and Round-Trippable Summary • A JSON-to-XML mapping is friendly when the generated XML has meaningful element and attribute names, rather than a name value pair design. • An XML-to-JSON mapping is friendly when the generated JSON has flat structure, making it easy for JavaScript programmers to consume it. • A JSON-to-XML mapping is r ound-trippable when the generated XML contains all the information that was in the original JSON, without any loss. • An XML-to-JSON mapping is round-trippable when the generated JSON contains all the information that was in the original XML, without any loss. Copyright IBM 10 A JSON XML Use Case JavaScript Clients HTTP-REST/JSON Friendly JSON Gateway SOAP/XML JSONx Friendly XML SOAP Applications Copyright IBM 11 Approach #1: JSON to Friendly XML • Requirement 1: Friendly XML processing • Requirement 2: Round-trippable • Rules: – JSON names become XML element names – JSON object members and arrays become XML elements, and – Simple values become XML text content – Synthesized XML 'root' element – Special handling to preserve data type, kinds of emptiness, and special characters Copyright IBM 12 Approach #1: Example In this example, { "city": "Armonk", "state": "NY" } becomes <root type="object"> <city>Armonk</city> <state>NY</state> </root> Copyright IBM 13 Approach #1: Special characters in names and empty names A JSON name may not match an XML NCName • Each illegal character is escaped • with two leading underscores, • a hexadecimal encoding of the character's Unicode codepoint and • A trailing underscore { <root type="object"> "var$":"value", <var__24_>value</var__24_> "":"generic" <__>generic</__> } </root> Copyright IBM 14 Approach #1: Simple Values JSON name/value pair with a simple value • Generate XML element from name • Store string equivalent of the simple value as content of the XML element • Typed as boolean, number, string, or null { <root type="object"> "age":43, <age type="number">43</age> "married":true, <married type="boolean">true</married> "address":null <address nil="true"></address> } </root> Copyright IBM 15 Approach #1: Object Values JSON name/value pair with a non-null object value, • Generate XML element based on the name • Generate child elements for each of the JSON name/value pairs <root type="object"> { <address type="object"> "address": { "street" : "123 Main St." } <street>123 Main St.</street> } </address> </root> Copyright IBM 16 Approach #1: Arrays JSON name/value pair with an array value • Generate a container XML element for the array • Generate one child XML element for each value in the array <root type="object"> <locations type="array"> <__>Amsterdam</__> { <__>London</__> "locations": ["Amsterdam", "London"], </locations> <secondary_locations "secondary_locations": [] type="array"></secondary_locations> } </root> Copyright IBM 17 Approach #1: Special Characters In Values Characters are copied from JSON except for certain conversions: < < > > & & \udddd &#xdddd \” “ \\ \ \/ / \n Newline (0x0A) \r &#xD \t tab(0x09) Non-XML chars(\b, \f) Omitted, or encoded as PI Copyright IBM 18 Approach #2: JSON to XML and Back • Requirement: Round-trippable • Usually a name/value pair • Rules: – XML elements to preserve data type – XML elements to preserve objects – XML elements to preserve arrays – XML attributes/text to preserve property names Copyright IBM 19 Approach #2: Example In this example, ... "batters": { "batter": [ { "id": 2001, "type": "Regular" }, { "id": 2003, "type": "Blueberry" } ] }, ... becomes <json:object name="batters"> <json:array name="batter"> <json:object> <json:number name="id">2001</json:number> <json:string name="type">Regular</json:string> </json:object> <json:object> <json:number name="id">2003</json:number> <json:string name="type">Blueberry</json:string> </json:object> </json:array> </json:object> Copyright IBM 20 Approach #3: XML to Friendly JSON • Requirement: Consumable/Friendly JSON • Requirement: Preserve XML structure • Rules: – XML element or attribute names become JSON object names, – XML children elements become JSON objects fields or arrays, and – XML text nodes become JSON simple values Copyright IBM 21 Approach #3: Multiple Child Elements With Same Name When multiple child elements have the same name, use JSON arrays to represent those elements. { "people": { <people> "person": [ <person> { <name>John Smith</name> "age": "40", "name": "John Smith" <age>40</age>

Experiences with JSON and XML Transformations IBM Submission to W3C Workshop on Data and Services Integration October 20-21 2011, Bedford, MA, USA

Deserialization Vulnerability by Abdelazim Mohammed(@Intx0x80)

Towards an Analytics Query Engine *

Evaluation of Xpath Queries on XML Streams with Networks of Early Nested Word Automata Tom Sebastian

Semantic FAIR Data Web Me

Atom-Feeds for Inspire

ISCRAM2005 Conference Proceedings Format

QUERYING JSON and XML Performance Evaluation of Querying Tools for Offline-Enabled Web Applications

Atom Syndication Format Xml Schema

Open Search Environments: the Free Alternative to Commercial Search Services

Programming a Parallel Future

SHACL Satisfiability and Containment

The IAEA on Line: Closer Links for the Global Nuclear Community