XSL Formatting Objects (XSL-FO), Part 1: Get the basics of XSL-FO techniques Convert HTML documents into formatting objects and then into PDF files
Skill Level: Introductory
Doug Tidwell ([email protected]) developerWorks Cyber Evangelist IBM
04 Feb 2003
Through examples and illustrations this tutorial for developers teaches the basics of working with XSL Formatting Objects (XSL-FO), a powerful, flexible XML vocabulary for formatting data, often used with XSLT to convert XML and HTML documents to PDF (portable document format). This tutorial, Part one of a two-part series, introduces how to use XSLT to convert XML documents into formatting objects and then the Apache XML Project's FOP (Formatting Object to PDF) tool to convert those formatting objects into PDF files. Examples include many XSL-FO sample code, XSLT templates, and some Java commands for the processing.
Section 1. Tutorial introduction and preparation
What this tutorial covers
The XSL Formatting Objects specification, an official recommendation of the W3C that is commonly known as XSL-FO, defines a number of XML tags that describe how something should be rendered. Although XSL-FO contains elements that describe how to render text in nonprint formats such as spoken text, this tutorial introduces how to create portable document format (PDF) files -- the most common use of XSL-FO.
The tutorial provides a brief overview of the XSL-FO document structure, as well as the elements that define page sizes, fonts, and margins. It also explains the basics
Get the basics of XSL-FO techniques © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 1 of 21 developerWorks® ibm.com/developerWorks
of text and graphic formatting and demonstrates the fundamentals of converting a formatting object file to PDF. Downloadable code samples make it easy to adapt samples to experiment on your own.
When you've completed this introductory tutorial, you'll understand what XSL-FO is and how it works. You'll be able to adapt the basic samples provided to create simple FO documents of your own. You'll be ready move on to the second tutorial in this series to find out how to control text formatting in detail and how to convert HTML elements to formatting objects. Then you will be able to create your own XML applications that use formatting objects to generate high-quality printable documents.
What you need to know to benefit from this tutorial
This tutorial assumes that you already comprehend the Extensible Markup Language (XML) and how to work with it and its related technologies, such as XML Stylesheet Language-Transformation (XSLT). You don't need to know anything about XSL-FO yet, but to work with formatting objects, you need a little experience working with XSLT.
The tools used for the examples are written in the Java code, but you don't have to understand the Java language to use them.
What you need to know about the software and standards
Figure 1. FOP project logo
Although you can use other XSL-FO rendering engines, this tutorial is written for the Apache XML Project's FOP (Formatting Objects to PDF) translator. The examples in this tutorial work with FOP Version 0.20.4, which was released on July 5, 2002. If you try them with other versions of FOP, they may or may not work. The XSL-FO spec became an official recommendation of the W3C on 15 October 2001; the FOP tool supports most of the final spec.
We use the FOP tool at developerWorks for two reasons:
• It's written in the Java language, and so it runs on all the platforms that we care about. • It's a no-cost, open-source product, and so anyone can afford it. If you want to immerse yourself in XSL-FO, you can go directly to the source for the the spec at the W3C's site (see Resources). Be aware that this is one of the longest documents at the W3C (roughly 400 pages), although most of it is reference information for the many elements and attributes in the XSL-FO tag set. The
Get the basics of XSL-FO techniques Page 2 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks®
reference sections -- particularly appendixes B, C, and D -- are very useful for looking up property names and values. Remember, as of this writing, FOP does not completely support the XSL-FO spec. Certain property names and values defined by the spec might not be supported by the tool, or they might be supported with slightly different names and values.
What tools you'll need for the tutorial and how to configure them
To go through the exercises in this tutorial, you'll need to have a Java Developer's Kit (JDK) Version 1.3 or later, as well as the FOP package from the Apache XML Project. You can get the FOP package. Download the latest version and unzip it.
Once you have the JDK and FOP installed, you need to set the classpath.
If you want to follow the examples in the tutorial without remembering to adapt them, put the FOP package at c:\fop-0.20.4rc and then set the classpath like this (except all on one line, of course; I've broken the line only to fit within the text column here):
set classpath=.;c:\fop-0.20.4rc\build\fop.jar;c:\fop-0.20.4rc\ lib\avalon-framework-cvs-20020315.jar;c:\fop-0.20.4rc\lib\bati k.jar;c:\fop-0.20.4rc\lib\xalan-2.3.1.jar;c:\fop-0.20.4rc\lib\ xercesImpl-2.0.1.jar;c:\fop-0.20.4rc\lib\xml-apis.jar;
If you unzip the FOP package somewhere else, you'll need to change the command accordingly. If you're running Linux, use the command export classpath=/usr/bin/fop-0.20.4rc/build/fop.jar:/usr/bin/fop-... and so on.
Section 2. XSL-FO document function and structure
XSL-FO document overview
An XSL-FO document defines several things that are important when producing high-quality printable documents:
• Information about the physical size of the page (letter, A4, and so on) • Information about margins (top, left, bottom, and right), running headers and footers, and other properties of the page • Information about fonts, font sizes, colors, and other characteristics of the
Get the basics of XSL-FO techniques © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 3 of 21 developerWorks® ibm.com/developerWorks
text • The actual text to be printed, marked up with elements that describe paragraphs, highlighting, tables, and similar things This section of the tutorial covers the paper size, margins, and other page properties, along with the XSL-FO elements used to describe them.
Before going on to the actual elements, let's look at the process used to convert an XML document into a PDF file.
Converting XML documents to PDF files
Converting an XML document to a PDF file takes two basic steps:
1. Use an XSLT stylesheet to transform the XML document into a file of XSL-FO elements. To perform the transformation, you simply invoke the XSLT processor with the XML document and the stylesheet. (Part 2 of this tutorial includes an XSLT stylesheet that converts XHTML elements into formatting objects.)
2. Use a rendering engine (for example, FOP, which is used in the tutorial examples) to convert the XSL-FO elements into a PDF file. This part is even simpler: You just invoke the FOP tool, giving it the name of the XSL-FO file and the name of the PDF file.
Here's a picture that outlines the process:
Figure 2. FOP process diagram
XSL-FO document structure at a glance
This picture plainly illustrates how an XSL-FO document is structured:
Figure 3. Structure of an XSL-FO document
Get the basics of XSL-FO techniques Page 4 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks®
The
XSL-FO document structure in detail
Start by looking at a simple XSL-FO document, simple.fo (also in the download file, x-xslfo-tutorial-samples.zip), and the tags and attributes it contains. Although it looks very complicated, don't be intimidated -- most of the things in this file never change. Normally you don't think about page layouts for every project; you just start by creating a set that works and then use them repeatedly.
Look first at the
Get the basics of XSL-FO techniques © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 5 of 21 developerWorks® ibm.com/developerWorks
If you want to see what the PDF version looks like, view the file simple.pdf (also in the download file, x-xslfo-tutorial-samples.zip).
The
The root element for an XSL-FO document is the
Typically, the root element contains a
The
The
The
The
Get the basics of XSL-FO techniques Page 6 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks® margin-left="72pt" margin-right="72pt">
• master-name Defines a name for this page master. You can create several different
• margin-top and margin-bottom Define the margins at the top and bottom of the page. Acceptable units are points, picas, inches, and centimeters.
• page-width and page-height Define the size of the physical page. This example defines a letter-sized page; to use A4-sized paper, the attributes page-width="21cm" and page-height="29.7cm" would do the trick.
• margin-left and margin-right Define the margins at the left and right side of the page.
Before looking at the
Units in XSL-FO documents
XSL-FO supports these actual units for length properties, for measuring items such as margin-left, page-width, and page-height:
Unit Meaning cm centimeters mm millimeters in inches pt points (72 points = 1 inch) pc picas (12 points = 1 pica, 6 picas = 1 inch) px pixels (sometimes different from one formatter or device to the next, so be careful) em the width of a capital M
For more details, including how pixels work, you can read the XSL-FO spec (see Resources).
Get the basics of XSL-FO techniques © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 7 of 21 developerWorks® ibm.com/developerWorks
The
The XSL-FO spec defines five regions on a page; region-body defines the dimensions of the main area in the center of the page. Here's a sample:
This element defines top and bottom margins of 50 points for the region-body area. The other four regions of the page are:
• region-before, the area at the top of the page (normally used for running heads) • region-after, the area at the bottom of the page (normally used for running feet) • region-start, the area to the left of the page • region-end, the area to the right of the page Here's how these regions are typically arranged on the page:
Figure 4. Page regions diagram
You define the properties of each region with the appropriate
Note: These definitions assume that the text in your document goes from left to right and top to bottom. If you're using a language whose characters are written some other way, the four outer regions may refer to different sections of the page.
Get the basics of XSL-FO techniques Page 8 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks®
The
The
The master-reference here refers to the master-name of the
The
The
The example dictates that this
The area named by the flow-name attribute has to be one of the five default names or a name you define somewhere in your XSL-FO document; you'll see how to give names to page areas in the second part of this tutorial series. For now, xsl-region-body is all you need to worry about.
Basic XSL-FO elements for content
The two main XSL-FO elements for formatting content are
Get the basics of XSL-FO techniques © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 9 of 21 developerWorks® ibm.com/developerWorks
element. A Your first PDF file To create a PDF file from the example document, you use the FOP tool to convert simple.fo into simple.pdf . Assuming that you've set the classpath correctly (described in What tools you'll need for the tutorial and how to configure them), you can run the FOP tool to convert your file with this command: > java org.apache.fop.apps.Fop simple.fo simple.pdf Congratulations! You've just created your first PDF file from XSL Formatting Objects! If you feel like experimenting right away, you can add more elements to the Figure 5. Picture of PDF with bold and monospaced font Now you're ready to learn more about formatting documents with XSL-FO. Section 3. Basic text formatting Get the basics of XSL-FO techniques Page 10 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks® Basic block formatting Now that you've covered some of the basics of the This example uses the line-height property to change the spacing between lines. Without this property, the line-height would be the same as the font-size. It's usually a good idea to make the line-height 3 to 6 points larger than the font-size; without a little space between the lines, the text may look cramped and hard to read. The amount of space required depends upon the characteristics of the font and the width of the text column; if you're working with a graphic designer on your team, follow your expert's advice on this value. Text formatting with the Here's how you can use the XSL-FO • Bold text: Use the A word about properties Get the basics of XSL-FO techniques © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 11 of 21 developerWorks® ibm.com/developerWorks Even the simple XSL-FO file uses several properties, such as font-size, line-height, and font-style. Many XSL-FO properties are identical to the CSS properties you might recognize. XSL-FO and CSS also share the same way that elements usually inherit properties from their ancestors. Here's the example paragraph again: Note that in defining the Another thing to keep in mind about properties: An XSL-FO property is just an XML attribute. You'll notice that I use the words properties and attributes interchangably in this tutorial. The XSL-FO spec does this as well. Character entities Here's one more aspect of text formatting to remember: character entities. Unlike HTML, XSL-FO doesn't define character entities. That means you have to define a character entity any time you want to use it. Here's the syntax for defining a character entity: ]> This code defines the entity and associates it with a particular character. (The first word after the DOCTYPE keyword must be the name of the root element of the document.) Once you've defined the character entity, you can use it in an XSL-FO document just as you do in an HTML document. You will see another example of the use of character entities in XSL-FO when you read about unordered lists in the follow-up to this tutorial, "XSL Formatting Objects (XSL-FO) advanced techniques." Section 4. Text block spacing and alignment Get the basics of XSL-FO techniques Page 12 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks® Text alignment Two attributes define the alignment of text in a block. The text-align attribute defines how lines of text are aligned, and the text-align-last attribute lets you define special handling for the last line of text in a block. Both attributes have the same set of values: start, center, end, and justify. For languages that are written left-to-right, text-align="start" produces left-aligned text, and text-align="end" produces right-aligned text. For languages that are written in other directions, start and end have other meanings. Here's how a centered paragraph looks in XSL-FO: Space between blocks Two sets of properties define how much space (if any) should appear between blocks: space-before and space-after. You can add the suffixes .minimum, .maximum, .optimum, and .precedence to modify these properties. This table outlines some sample formatting with these properties and describes the effect of the suffixes: XSL-FO sample Meaning The XSL-FO spec defines all of the attributes and the rules for determining preferences when the various spacing properties conflict with each other. In this tutorial, the examples usually specify the nonmodified value and leave it at that. Combining the properties and their components, you have 10 choices in total: space-before space-before.minimum space-before.maximum space-before.optimum Get the basics of XSL-FO techniques © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 13 of 21 developerWorks® ibm.com/developerWorks space-before.precedence space-after space-after.minimum space-after.maximum space-after.optimum space-after.precedence Break prevention within and between blocks The XSL-FO spec defines a few properties that give the rendering engine hints about how to keep blocks of content together. There are three sets of properties here: keep-with-next, keep-with-previous, and keep-together. You can combine the base properties with the components .within-line, .within-column, and .within-page. This table shows some sample formatting that uses these properties and describes the effect of their use: XSL-FO sample Meaning As with the space-before and space-after properties and their components, there are a number of combinations: keep-with-next keep-with-next.within-line keep-with-next.within-column keep-with-next.within-page keep-with-previous keep-with-previous.within-line keep-with-previous.within-column keep-with-previous.within-page keep-together keep-together.within-line keep-together.within-column keep-together.within-page The valid values for these properties are auto, which lets the renderer decide when to keep lines together, and always, which says that two blocks should always be kept together. You can also use an integer value; the higher the number, the greater the precedence of the property (always is higher than any integer). Having said all of this, be aware that FOP doesn't always handle these properties correctly, so don't expect them to work every time. Break placement before and after blocks Get the basics of XSL-FO techniques Page 14 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks® XSL-FO also has properties that tell the renderer how to break blocks apart. The break-before attribute has five values: Attribute value What the value does auto Let the rendering engine figure it out column Put a column break before this block page Put a page break before this block odd-page The rendering engine inserts a page break (or two, if necessary) so that this block begins on an odd-numbered page. In other words, if one page break would cause this block to begin on an even-numbered page, FOP inserts a second page break. even-page The rendering engine inserts a page break (or two, if necessary) so that this block begins on an even-numbered page. There is also a break-after attribute that has the same five values to specify breaks that come after the current block. Widows and orphans The final controls for keeping lines together are the orphans and widows properties. Widows and orphans are single lines or partial lines from the beginning or end of a paragraph that appear all by themselves because a page or column break interrupts a text block in an awkward place. In FO you can specify how many lines of a block should stay together before or after a break. The widows property defines the minimum number of lines that must appear together at the bottom of a page; the default value is 2. The orphans property defines the minimum number of lines that must appear together at the top of a page. Its default value is 2 as well. Section 5. Basic graphics GIF and JPEG graphics To add graphics to a PDF file throughout the FO file, use the You can use this element to embed GIF and JPEG images in PDF files. The Get the basics of XSL-FO techniques © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 15 of 21 developerWorks® ibm.com/developerWorks XSL-FO spec also defines the height and width attributes; those elements help the FOP engine figure out how much room the graphic requires. A final note: The SVG graphics FOP now includes the Batik SVG engine (see Resources) for rendering SVG (Scalable Vector Graphics) inside PDF files. If the graphic is in an external file, you can include it with the If you want to generate SVG directly from your source data, you can create a style sheet that creates both the XSL-FO elements and the SVG elements. Both of them will then be processed by the FOP engine. Drawing lines The XSL-FO spec also defines a This table outlines three ways to use the FO leader sample The result A horizontal line that fills the width of the current A horizontal line 100 points long Get the basics of XSL-FO techniques Page 16 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks®
element), to draw lines for fill-in-the-blank forms, and to draw dotted lines between headings and page numbers in the table of contents.
A horizontal line of dots that fills the current
The valid values for the leader-pattern property are space, rule, and dots. The default value is space, meaning the
As far as I know, it isn't possible to add vertical rules to blocks; you have to use SVG for that.
For a complete example that covers all the formatting introduced in this tutorial, move on to Summary and resources .
Section 6. Summary
Summing up by example
Now that you've looked at blocks, inline elements, and graphics, it would be worthwhile for you to view the file blocks.fo (also in the download file, x-xslfo-tutorial-samples.zip). It contains various kinds of text formatting, external graphics, and an inline SVG graphic. There are also lines between sections of the page. Here's a fragment of the PDF file generated from the file:
Figure 6. Screen capture of blocks.pdf sample file
Get the basics of XSL-FO techniques © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 17 of 21 developerWorks® ibm.com/developerWorks
You can also view the PDF file blocks.pdf (also in the download file, x-xslfo-tutorial-samples.zip) to see the entire result.
Get the basics of XSL-FO techniques Page 18 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks®
Downloads
Description Name Size Download method Sample files x-xslfo-tutorial-samples.zip9KB HTTP
Information about download methods
Get the basics of XSL-FO techniques © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 19 of 21 developerWorks® ibm.com/developerWorks
Resources
Learn • Move on to the sequel to this tutorial, XSL-FO advanced techniques, which covers in detail how to format complex documents, lists, tables, cross reference links, plus how to convert HTML elements to formatting objects. • Read up on the XSL-FO spec (all 400 pages) at the W3C's site, w3.org/TR/xsl. • For more information about XSL-FO in general, visit the W3C's Style page at w3.org/style/xsl/. • Find out more about the Batik SVG engine for rendering SVG (Scalable Vector Graphics) inside your PDF files at http://xml.apache.org/batik/. Batik is included in the version of FOP used for this tutorial. • Nicholas Chase's tutorial "Introduction to Scalable Vector Graphics" is a great place to learn the basics of SVG (developerWorks, March 2004). • To see another application of XSL-FO, check out Rodolfo Raya's article for developerWorks, "Using XSL-FO to create printable documents." The article explains how the author uses XSL-FO to generate printable database reports from a Java language application (developerWorks, November 2001). • Find more information on the technologies covered in this tutorial at the developerWorks XML zone. • Find out how you can earn IBM XML 1.1 certification. • For additional tips and how-to advice on using XSLT, skim the articles already published on that technology in the developerWork XML zone. • For additional instruction in XSLT and other XML techniques, check out the dozens of other tutorials in the XML zone on developerWorks. • Stay current with developerWorks technical events and Webcasts. Get products and technologies • Download Xalan, the XSLT processor used in the examples in this tutorial, from the Apache XML Project http://xml.apache.org/xalan-j/index.html. • The examples in this tutorial work with FOP Version 0.20.4, which was released on July 5, 2002. Download the Apache XML Project's FOP package. • Build your next development project with IBM trial software, available for download directly from developerWorks.
About the author
Doug Tidwell Doug Tidwell is the developerWorks Cyber Evangelist, helping people use new
Get the basics of XSL-FO techniques Page 20 of 21 © Copyright IBM Corporation 1994, 2008. All rights reserved. ibm.com/developerWorks developerWorks®
technologies to solve problems. He has spoken about Web Services and XML to tens of thousands of developers around the world, a number of whom actually stayed awake. He is also the author of O'Reilly's XSLT and a coauthor of O'Reilly's Programming Web Services with SOAP, both of which make excellent gifts for your friends and loved ones.
In a rare brush with greatness, he and his daughter Lily were once trounced by Olympic gold medalist Marion Jones in a game of Whack-a-Mole at the North Carolina State Fair: • Dad: How can the game be over already? • Lily: No way! Nobody could be that fast! [Dad looks over, sees winning contestant] • Dad: Come on, Lily, let's go....
Get the basics of XSL-FO techniques © Copyright IBM Corporation 1994, 2008. All rights reserved. Page 21 of 21