Languages of the WEB

Jukka K. Nurminen 21.1.2013

Adversements

• Summer internships to apply – hp://cse.aalto.fi/2013/01/15/summer- internships-in-data-communicaons-soware-2/ • EIT master school info event on 6 February at 16 – 18 – Spend a year in Berlin, Rennes, Paris, Trento, etc. and come back to Aalto to finalize the studies and do the master’s thesis – Innovaon & entrepreneurship minor included Praccalies

• Last me Praccalies + Introducon – Slides available in Noppa • Assignments – GIT training – Other news – Pair formaon during break/aer lecture – Remember to signup for the assignments • Tomorrow 1st mobile lecture Languages for WEB apps Languages for WEB apps

• Programming languages – JavaScript (client side, also server side node.js) – PHP, Python, Pearl, Java, Scala, Erlang, … (server side) • Data interchange and storage languages – XML, JSON (textual) – EXI, BSON, Protobuf (binary) – (HTML) • Presentaon languages – CSS – (HTML) • Metalevel languages – XML Schema, XSLT, JSON Schema HTML

• Tags specify how the title goes here page looks like (e.g. h1) • This is fine for the

My first page stac pages This is my first web • But it is not easy to use this form for data transfer Server-Browser data transfer (stac content)

Browser HTML Server 1 rendering

• When the browser just renders the page created by the server HTML works fine

Server-Browser data transfer (dynamic content) HTML => XML, JSON Browser processing & Server 1 rendering

• If the client needs to process the incoming data HTML is problematic. • E.g. in if only part of the page needs to be redisplayed (as is the case in AJAX) • This requires that the browser is able to parse and understand the data to be able to process it Server-Server data transfer

Server wants to process data coming from other servers

HTML Browser Server 1

HTML? => XML • Server1 is not interested in page presentation => HTML not optimal Server 2 • A data transfer language is needed => XML, JSON

Personalized tags needed

• HTML started with very few tags … • Language evolved, as more tags were added – forms – tables – fonts – frames ⇒ Need for personalized tags for different domains ⇒ E.g. for math, music, purchase orders, … MathML

2 • Designed to • x + 4x + 4 =0 express x 2 semancs of + maths 4 &invisiblemes; x + • Cut & paste 4 into Maple, = 0 Mathemaca

Applicaons need to parse the data

• HTML syntax was loosely defined which makes its programmac manipulaon difficult – Closing tags are oponal – Different browsers tolerated different violaons of HTML syntax – XHTML tried to be strict but its adopon was slow because of compability issues • HTML5 will replace ⇒ Need for a more structured and well-defined presentaon ⇒Separate language for data presentaon (XML) Elements and aributes

Attributes (for auxiliary data) Tove Elements between tags Jani Reminder Don't forget me this weekend! Borderline between elements and attributes Jani … is not clear. Attributes are mainly for auxiliary data. Elements for main data Suitability of XML

• Suitable for storing and exchanging any data that can plausibly be encoded as text. • Unsuitable for digized data such as photographs, recorded sound, video, and other very large bit sequences – But it works well for storing the metadata of such items • Uses of XML – Data transfer between applicaons – Configuraon files Example Uses

• A web browser, such as Netscape Navigator or Internet Explorer, that displays the document to a reader • A word processor, such as StarOffice Writer, that loads the XML document for editing • A database, such as Microsoft SQL Server, that stores the XML data in a new record • A personal finance program, such as Microsoft Money, that sees the XML as a bank statement • A syndication program that reads the XML document and extracts the headlines for today's news What XML is not

• Not a programming language – you cannot execute XML • Not a network transport protocol – Actual sending with HTTP, FTP, etc. • Not a database – But XML can be stored and retrieved from databases – Some databases are able to return query results in XML form Parsing XML

DOM SAX • Document Object Model • SAX is Simple API for XML (DOM) API • SAX is an event based parser •DOM is a tree model parser • Parses node by node • Stores the enre XML • Doesn’t store the XML in document into memory before memory processing • We can’t insert or delete a • Occupies more memory node • We can insert or delete • SAX generally runs a lile nodes faster than DOM • Traverse in any direcon. • DOM is used as the internal data model of browsers Performance comparison

Source: http://www.devx.com/xml/Article/16922/1954 HTML vs. XML

• HTML • XML – Fixed set of tags – Extensible set of tags – Presentaon oriented – Content oriented – The language of web – The meta language for pages defining new domain specific languages SGML – the father of all (1986)

• SGML Old markup languages: – Standard Generalized troff, tex, etc Markup Language – Meta language for defining languages – Complex, sophiscated, powerful – Very advanced but also very complicated SGML example (1985) Languages for XML manipulaon

• Meta-level languages • XML Schema – For checking the validity of XML document • XSLT – For document transformaons Related standards

• Namespaces – Modular document definion, mulple inheritance, collision avoidance • XPath / XQuery – Navigaon and query of parts of the document • XML linking language (Xlink) – Associaons between mulple resources – Rules for traversal • XML Schema – definion of document structure and custom data types • XSLT – Extensible Stylesheet Language Transformaon – Transformaon of documents XML Schemas and instances

• To be useful all pares have to agree what tags are available and what they mean • => Need to describe what tags are available and how they are used • Also what kind of values should they contain Well-formed and Valid • Well-formed – Syntaccally correct XML • Valid – Matches a defined XML Schema XML Schemas and instances

Defines tags and Compare to object- other parameters oriented programming

XML Schema Class

XML instance Instance (.XML file)

Provides the data Schema definion

• Main alternaves – DTD • Document Type Definion • Borrowed from SGML – XML Schema • XML based way to define XML schemas DTD

]> Limitaons of DTD

• Difficult to define other constraints than element nesng and recurrence • DTD syntax is not XML XML Schema

About XML Schema

• XML language for describing and constraining the content of XML documents • A W3C Recommendaon • Used to specify – The allowed structure of an XML document – The allowed data types contained in XML documents • XML Schema documents are XML documents • Schema document: elements, aributes, and type definions + annotaons Defining a simple element • A simple element is defined as where: – name is the name of the element – the most common values for type are xs:boolean xs:integer xs:date xs:string xs:decimal xs:time • Example • Defining an aribute

• Aributes themselves are always declared as simple types • An aribute is defined as where: – name and type are the same as for xs:element • Other aributes a simple element may have: – default="default value" if no other value is specified – fixed="value" no other value may be specified – use="optional" the aribute is not required (default) – use="required" the aribute must be present • Example – Restricons, or “facets” • The general form for pung a restricon on a text value is: – (or xs:attribute) ... the restrictions ... • For example: – Complex elements

• A complex element is defined as ... information about the complex type... • Example: Local definition of complex type says that elements must occur in this order Globally defined complex type

• Definition

• Use •

XSLT

How to transform XML documents to other kinds of documents? XSLT

• XSLT stands for Extensible Stylesheet Language Transformaons • XSLT is used to transform XML documents into other kinds of documents, e.g. to HTML or other kind of XML • XSLT uses two input files: – The XML document containing the actual data – The XSL document containing both the “framework” in which to insert the data, and XSLT commands to do so Why transform?

• Convert one schema to another • Rearrange data for formang • Some special transforms – XML to HTML— for old browsers – XML to LaTeX—for TeX layout – XML to SVG—graphs, charts, trees – XML to plain-text—occasionally useful Very simple example

• File data.xml: Hello

• File render.xsl: Tells which part of the tree to process

Finds the “message” element XML Summary

• Metamarkup language, standardized by W3C • In comparison to HTML nothing about presentaon • Elements within tags • XML Applicaons – Individual or organizaons agree to use a set of tags • Well-formed - Syntaccally correct XML • Valid - Matches Schema • Schema definions – DTD – document type definion – XML Schema JSON

JavaScript Object Notaon The trouble with XML? XML documents tend to be verbose => Communication overhead

XML Not very easy Browser Server 1 for humans to write or read

XML

• Modern web applications process data in the browser (and increasingly in the server) typically with JavaScript Server 2 • => processing XML data can be heavy

JSON

• JavaScript Object Notaon • Minimal • Textual • Subset of JavaScript

• Increasingly popular • Many services return data in JSON format • Yet another example where simpler is beer JSON Example

• {"skills": { "web":[ {"name": "html", "years": "5” }, { "name": "css", "years": "3” }], "database":[ {"name": "sql", "years": "7"}] }} What is JSON?

• Lightweight data-interchange format – Compared to XML • Simple format – Easy for humans to read and write – Easy for machines to parse and generate • JSON is a text format – Programming language independent – But is especially well suited for JavaScript manipulaon JSON & JavaScript

• You can evaluate the JSON object in JavaScript – Assign it to a variable and access via normal JavaScript mechanisms for objects and arrays • Compare this with the XML approach where you have to traverse the XML tree to extract the values before you can do something with them Similaries between JSON and XML

• They are both 'self-describing' meaning – that values are named, and thus 'humanreadable' • Both are hierarchical. (i.e. You can have values within values.) • Both can be parsed and used by lots of programming languages • Both can be passed around using AJAX (i.e. hpWebRequest) JSON vs. XML

• Lighter and faster than XML as on-the-wire data format – Relevant for mobile devices, data transfer costs • JSON is less verbose – Quicker for humans to write, and probably easier to read • JSON objects are typed while XML data is typeless – JSON types: string, number, array, boolean, – XML data are all string • Nave data form for JavaScript code – XML data needed to be parsed and assigned to variables through tedious DOM APIs – Data is readily accessible as JSON objects in your JavaScript code – Retrieving values is as easy as reading from an object property in your JavaScript code Efficient XML Interchange (EXI)

• What is EXI? • EXI stands for Efficient XML Interchange. It is commonly known as "binary XML" • EXI enables you to operate on XML without being aware that you are using a much smaller binary-formaed XML • EXI is a W3C recommendaon • Benefits of EXI? • EXI avoids the bloatedness of text-formaed XML • Applicaons can generate EXI directly and can operate on EXI directly, without first converng to text-formaed XML • You can validate a binary-formaed XML document in the same way you validate a text-formaed XML document notebook.xml

EXI Do not forget it! shopping list milk, honey

Contains comments, PIs, namespace prefixes. Source: Roger L. Costello, 2011 http://www.xfront.com/EXI/index.html notebook.exi

Ú‹ËÝÝÝË››ÝX›ÛÚ˛ܙÂ[›ÝX›ÛÚÒ ÈÂèÊd``nZ`rZbe +s{£*@Œ ËL ËLŒèhttp://www.category.org categoryEXI¨æêÄÔÊÆé€Ê±7²<âhä Íî„ÍîL쮄 .„&€?îm î -Ìä .n€ÚÒØÖX@ÐÞÜÊòP

How EXI compression works

See: http://www.w3.org/TR/2009/WD-exi-primer-20091208/ Convert into stream of events Event types Efficient coding of events Beer compression by re-arrangement BSON

• The language how MongoDB stores data – Now finding uses outside of MongoDB – Binary serializaon • Lightweight Keeping spaal overhead to a minimum is important for any data representaon format, especially when used over the network. • Traversable BSON is designed to be traversed easily. This is a vital property in its role as the primary data representaon for MongoDB. • Efficient Encoding data to BSON and decoding from BSON can be performed very quickly in most languages due to the use of C data types. • hp://bsonspec.org BSON Examples Protobuf Protocol Buffers (Protobuf)

• For serializaon of structured data • Developed and used extensively in Google – Internal communicaon between servers – Now available through open-source licence Performance Comparison Others to keep an eye on

• MessagePack Summary of today

• HTML • XML and related languages – XML – DTD – XML Schema – XSLT • JSON • Binary formats – EXI – BSON – Protobuf – Others