Presentation of XML Documents
Patryk Czarnik
Institute of Informatics University of Warsaw
XML and Modern Techniques of Content Management – 2010/11
Stylesheets Separating content and presentation Associating style with document
CSS Idea and applications CSS sheet structure Formatting
XSL Presentation by transformation XSLT — Introduction XSL-FO Separating content and presentation
I According to XML best practices documents contain:
I content/data I markup for structure and meaning (semantical markup) I no formatting I How to present?
I “hard-coded” interpretation of known document types I external style sheets
Stylesheets idea
* yellow backgorund * font 'Times 10pt' * blue font and border * 12pt for name * italic font for name * abbreviation before * typewritter font for email phone number Advantages of separating content and formatting
I General advantages of semantical markup
I better content understanding and easier analysis I Ability to show again
I the same document after modification I another document of the same structure I Formatting managed in one place
I easy to change look of whole class of documents I Ability to define many stylesheets for a document, depending on:
I purpose I medium (screen, paper, voice) I detail level (full report, summary) I reader preferences (font size, colors, . . . )
Standards related to XML presentation
I General:
I Associating Style Sheets with XML documents I Stylesheet languages:
I DSSSL (deprecated, applied to SGML) Document Style Semantics and Specification Language I CSS Cascading Style Sheets I XSL Extensible Stylesheet Language Associating Style Sheets with XML documents
I W3C Recommendation
I Processing instruction xml-stylesheet
One stylesheet
xsl"?>
Alternative stylesheets
css" href="blue.css" ?>
Cascading Style Sheets
I Style sheets idea beginnings: 1970s
I motivation — different printer languages and properties
I CSS beginnings: 1994
I CSS Level 1 Recommendation: December 1996 I CSS Level 2 Recommendation: May 1998
I implemented (more or less completly) in present-day internet browsers I CSS Level 3 — work still in progress:
I modularisation, I new features. I CSS 2.1 — Candidate Recommendation:
I CSS 2 corrected and made less ambiguous I also, a bit less ambitious (unsupported features thrown out) CSS applications
I First and main application: style for WWW sites
I Separation of content and formatting for HTML
I Simple stylesheets for XML
Ideologically important ! especially from Level 2 ! according to web accessibility
I support for alternative presentation methods (e.g. voice)
I enabling reader to provide her/his own style (e.g. larger font and contrast colors for people with eyesight disability)
Example XML
< person position="manager" id="102104">
person { display: block; margin: 10px auto 10px 30px; padding: 0.75em 1em; width: 200px; border-style: solid; border-width: 2px; border-color: #002288; background-color: #FFFFFF; }
person[position=’manager’] { background-color: #DDFFDD; }
name, surname { display: inline; font-size: larger; }
person[position=’manager’] name { font-weight: bold; }
Result of formatting Selectors
I person — element name I name, surname — two elements I department name — descendant I department > name — child I A + B — element B directly following A I name:first-child — ‘name’ being first child I person[position] — test of attribute existence I person[position=’manager’] — attribute value test I person[position~=’manager’] — value is element of list I person#k12 — ID-type attribute value I ol.staff — equivalent to ol[class~=’staff’] (HTML specific).
Style depending on media type
Example
@media print { person { background-color: white; font-family: serif; } }
@media screen { person { background-color: yellow; font-family: sans-serif; } }
@media all { person { border-style: solid; } } display property
I Kind of visual object representing an element
I Allowed values: inline, block, list-item, run-in inline-block table, table-cell,..., none
I Basic property in case of XML visualisation
I Rare exeplicit use for HTML (browsers know default values for HTML elements)
Boxes and positions
I Blocks nested according to nesting of elements in document
I Blocks layout handled by CSS
I Custom positioning possible
I margin, padding — external and internal margins
I border-style, border-color, border-width — border properties
I position (static | relative | absolute | fixed)— positioning policy
I left, right, top, bottom — position coordinates
I width, height, min-width, max-height, . . . — size Boxes, margins, borders
cite: W3C, CSS Level 2 Recommendation
Margins and borders — example
person { display: block; margin: 10px auto 10px 30px; padding: 0.75em 1em; width: 200px; border-style: solid; border-width: 2px; border-color: #002288; background-color: #FFFFFF; } Text and font properties
I color, background-color, background-image — color and background,
I font-family — name or generic family of font
I font-size
I font-style, font-weight
I text-decoration
I text-align
Fonts and colors example
company { display: block; background-color: #EEFFFF; color: rgb(0, 0, 33%); font-family: ’Bookman’, ’Times New Roman’, serif; font-size: 14pt; }
company > name { font-size: 1.5em; font-family: ’Verdana’, ’Arial’, sans-serif; font-weight: bold; text-align: center; }
department > name { font-size: 1.3em; font-family: ’Verdana’, ’Arial’, sans-serif; font-weight: bold; font-style: italic; } Generated content
I Ability to show text not occurring in document text content
I Access to attribute values
I Automatic numbering
Example
person:before { content: attr(position); }
phone[type=’office’]:before { content: ’tel. ’; }
phone[type=’mobile’]:before { content: ’kom. ’; }
CSS capabilities and advantages
I Great visual formatting capabilities
I especially for single elements I Elements distinguished basing on
I name I placement within document tree I existence and value of attributes I Support
I internet browsers I authoring tools
I Easy to write simple stylesheets :) CSS lacks and drawbacks
I Only visualisation
I no conversion to other formats I Some visual formatting issues
I e.g. confusing block size computing (moreover, incorrectly performed in the most popular Web browser) I Not possible (CSS Level 2):
I repeating the same content several times I distinguish elements basing on their contents I complex logical conditions I data processing (e.g. summing numeric values) I access to many documents at once I Some things require more work than they should, e.g.:
I to show attribute values I to change elements order
Presentation by transformation
I In order to present a document, we can transform it into a format, that we know how to present. I Examples of “presentable” XML formats:
I XHTML I XSL Formatting Objects Extensible Stylesheet Language (XSL)
I First generation of recommendations (1999 and 2001):
I XSL (general framework, XSL Formatting Objects language) I XSLT (transformation language) I XPath (expression language, in particular paths addressing document fragments) I XSL stylesheet — transformation from XML to XSL-FO
I usually for a class of documents (XML application) I in practice, transformation to other formats, often (X)HTML
XSL Idea
cite: W3C, XSL 1.0 XSLT — status
I Created within XSL
I Applications beyond XML visualisation I Version 1.0:
I October 1999, connected with XPath 1.0 I good support in tools and libraries I Version 2.0:
I January 2007, connected with XPath 2.0 and XQuery 1.0, I new features I less software support
XSLT — support
I XSLT 2.0 processors:
I Saxon
I Java and .NET libraries, command-line applications I free (Open Source) basic version I commercial schema aware version
I XML Spy (commercial windows application) I XSLT 1.0 processors:
I internet browsers I Xalan (Java and C++ libraries) I xsltproc, part of libxml (C, basically for Linux) I XML extensions of database engines I ... I Authoring tools
I raw text editors and programmer environments (e.g. Eclipse) I specialised tools — rather paid (e.g. XML Spy, oXygen). XSLT — sheet example
1/2
Staff list
XSLT — sheet example
2/2
...
I Stylesheet (arkusz) consists of templates.
I Template (szablon) specifies how to transform a source node into result tree fragment I Inside templates:
I text and elements not from XSLT namespace → copied to result I XSLT instructions → affects processing I XPath in instructions → access to source document, arithmetic, conditions testing, . . .
I XSLT can be considered a programming language specialised for XML documents transformation.
XSLT — operation overview
I Transformation on tree level I Template for document node (root) started first
I template exists even if we had not written it
I apply-templates within a template — another templates called for other nodes, usually for children
I Templates matched according to node kind and name, location within document tree, etc.
I Additional features: conditional processing, copying values and nodes from source, generating new nodes, grouping, sorting, . . . XPath in XSLT
I XPath expressions used for:
I accessing nodes and values from source document I computing values I checking logical conditions I XPath constructions:
I paths (navigating through document tree) I arithmetic and logical operators I functions for numbers. strings, etc. I more in version 2.0
I XSLT 1.0 uses XPath 1.0.
I XSLT 2.0 uses XPath 2.0.
XPath in XSLT stylesheet — example
...
I XSL Formatting Objects:
I in line with XSL idea I usable e.g. for printout I HTML and XHTML:
I most popular I usable for Web publication and on-screen view I Arbitrary XML:
I translating to another (or new) format I picking up or processing data (alternative to XQuery) I XSLT as result of XSLT I Plain text
I CSV and other text data formats I scripts or configuration files I plain representation of text documents
Transformations vs styles
cite: Szymon Zioło, XML i nowoczesne technologie zarz ˛adzaniatresci´ ˛a XSL Formatting Objects
I XML application designed for visual presentation I Focused on printed media
I page templates (“masters”) I automatic pagination
I Normally, XSL-FO documents are not written by hand but generated via XSLT.
XSL-FO document structure
Transformation to XSL-FO (1/2)
Transformation to XSL-FO (2/2)
Ksi#gowo##
Dawid Paszkiewicz tel. +48223213203 kom. +48501502503 [email protected]
Monika Dom#a#owicz tel. +48223213200 kom. +48501502513 [email protected]
page-master — page template
I Single page layout
I A document may be split in many such pages.
Example
Content of pages page-sequence results in a number of pages flow content split into pages static-content content repeated on all pages flow-name page region reference
Stylesheet fragment
XSL-FO tree elements
Block level Inline level I block I inline, I list-block, character list-item, list-item-label I external-graphics I table, table-row, table-cell ,... Special features
I basic-link, bookmark
I footnote
I flow
I ... Style of visual elements
I CSS roots of visual formatting model I Attributes for style properties:
I margin, padding, border-style ... I background-color, background-image ... I font-family, font-weight, font-style, font-size ... I text-align, text-align-last, text-indent, start-indent, end-indent, wrap-option, break-before ...
Lists — example
Ksi#gowo## * Stanowisko: starszy referent Imi#: Dawid Nazwisko: Paszkiewicz tel.: +48223213203 kom.: +48501502503 Email: [email protected] * Stanowisko: kierownik Imi#: Monika Nazwisko: Dom#a#owicz tel.: +48223213200 kom.: +48501502513 Email: [email protected]
Tables — example
XSL-FO — support
I Commercial software:
I Antenna House XSL Formatter I RenderX I Ecrion I Lunasil LTD Xinc I ... I Open software:
I Apache FOP I xmlroff XSL-FO — critique
Main XSL-FO advantages
I “in line” with XSL
I easy and direct way to obtain printout (e.g. PDF) from XML data
I general advantages of stylesheets over “hard-coded” formatting
Main XSL-FO disadvantages
I too complex for simple needs I too limited for advanced needs
I lack of pagination feedback, cannot say “if these two elements occur on the same page. . . ” I hard to format particular elements in a very special way (thus this is a general drawback of using stylesheets)