Single Source Publishing with Apache Forrest

by Ferdinand Soethe, Ross Gardler

Apache Forrest harnesses the power of to build a single source publishing framework, based on common standards (xml). It is easy to use and extend to suit a wide variety of publishing needs. Learn how to get started with Forrest in few easy steps. Take a closer look at the design concepts behind Forrest. See that it can be easy to publish a single document to very different target media (Web, print, speech, presentation slides) in a consistent and uniform way. Find out about 'smart' slide presentions you can create with Forrest. Take a tour of input-plugins for OpenOffice and other common formats learning how to integrate different document- and data sources in a common corporate design. Discover Forrest's potential to quickly embed dynamic data from local and remote systems.

Table of contents

1 Introduction...... 2 2 Publishing Facts 2005...... 3 3 A Look At Alternatives...... 8 4 Facts about Forrest...... 10 5 Forrest Publishing Features...... 12 6 SSP Power in Detail...... 14 7 Forrest for (Your) Business...... 21 8 Thank You!...... 24 9 Time for Your Questions...... 24

Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Foundation All rights reserved. Single Source Publishing with Apache Forrest

1. Introduction

1.1. Single Source Publishing? Single Source Publishing ... • is about creating content for different target media from just a single source (document), • has the potential to • improve media quality, • enhance author's satisfaction and • save you time and money.

1.2. Goals of this Session In this session we'll • talk about Apache Forrest, the affordable publishing-solution, • show you how simple it is to get started, • explain the power in Forrest's architecture and • give you a tour of useful applications.

1.3. Here for You today ... Ferdinand Soethe

• Systems Analyst, Technical Writer and Consultant • Committer and Member of Forrest PMC • Currently working on the Smart Presentations PlugIn

1.3.1. Who can I ask? Fell free to contact me privately by email ([email protected]). Post questions about Apache Forrest to one of the Apache Forrest project mailing lists (http://forrest.apache.org/mail-lists.html)

1.4. Agenda

Page 2 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

• Presentation (40 minutes) • Introduction • Publishing Facts 2005 • A Look At Alternatives • Facts about Forrest • Forrest Publishing Features • SSP Power in Detail • Forrest for (Your) Business • Questions & Answers (10 minutes)

2. Publishing Facts 2005

2.1. The WYSIWYG-Trap • Hardly any word processing software lives up to the promise • Writing and typesetting often are incompatible skills • WYSIWYG-Documents translate poorly to other media

2.1.1. WYSIWYG: Success or Failure Wikipedia has the following entry for WYSIWYG: • A description of a user interface that allows the user to view the end result while the document or graphic character is being created. For example, a user can see on screen how a document will look when printed. • Allows the user to concentrate entirely on how the content should appear. • Also refers to the ability of modifying the layout of a document without having to type (and remember) names of layout commands. • Also used to describe specifically a web-page creation program in which the user creates the web page visually, while the program generates the HTML for it. Often users can also edit this HTML if they so desire.

WYSIWYG, is becoming increasingly difficult to realise. This is partly because we often want to use the same content in different environments. To reuse content in a WYSIWYG environment usually means one has to re-edit the layout for each publication. Furthermore, using the same tool for content editing and layout means that the writer of the content must

Page 3 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

also play the role of layout designer. Two skills that are rarely present in any one individual. Where WYSIWYG is useful is in single use documents that are unlikely to be further processed in any way, perhaps for a quick, informal letter to a friend. However, in the business environment it is unusual to create content that will not need to be further processed. For further discussion about the history of WYSIWYG and a discussion of its strengths and weakness see What has WYSIWYG done to us? (http://www.ideography.co.uk/library/seybold/WYSIWYG.html), first published in 1996. The author observes: how we were seduced by WYSIWYG's illusion of control and how we lowered our expectations and typographic standards and became deeply confused about who in publishing is supposed to do what. He goes on to conclude: Good typography requires a lot more than good-quality typefaces. It also requires improved composition algorithms within publishing software -- both for paper and for the Web. It is interesting to read this paper today, nearly ten years later. The majority of issues and problems identified in that paper are still of great concern. In fact, with the growth of the Web Publishing and "user level" tools for "easy" "publishing" the issues raised within become even more important.

2.2. Examples of HTML Export problems with Microsoft Word • Table Of Contents Become Meaningless • Inexplicable Whitespace Inserted • Bad Layout of Objects • Mixing of Content and Style Information

2.2.1. Additional Information Here are just a few examples from a single Microsoft Word Document exported as an HTML page using the latest editition of MS Word. This document was pulled randomly from the Internet to illustrate the dangers of using a word processor to create content intended for use in any environment other than the originating word processor.

2.2.1.1. Table Of Contents Become Meaningless When a word processor creates a tables of contents it usually inserts a special marker into the

Page 4 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

content itself indicating that the contents should be displayed at the given point. However, this is not always the appropriate location or method for creating a table of contents.

Table of Contents Relevant only to Print The screenshot shows how this chosen document has created a table of contents that refers to the page number that the content appears on. But this is a web page rendering, there are no page numbers. It would be more appropriate to create a table of contents with hyperlinks to the relevant section of the web page. Alternatively, we may choose to place the Table of Contents in the navigation menu of the site. With this HTML export it is impossible to do this without editing the exported file. This is hardly Single Source Publishing.

2.2.1.2. Inexplicable Whitespace Inserted In this screenshot we can see that for some strange reason there is a large chunk of whitespace inserted between the table header and the table itself. This whitespace cannot be seen in the original document so clearly, the editor is failing in its goal of WYSIWYG editing.

Page 5 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

Large amounts of meningless whitespace are inserted

2.2.1.3. Bad Layout of Objects In this final screenshot we see an example of the word processors failure to translate layout of objects correctly to the screen. Notice how the Date object is not aligned with other meta data elements, in fact, it is hidden behind the image. In addition notice how the text is incorrectly appearing to the right of the image and then wrapping below the image. In the

Page 6 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

editing environment this text all appears below the image. Again, a complete failure of the editor to provide a WYSIWYG environment.

Objects overlap One Another

2.2.1.4. Mixing of Content and Style Information Not content with mixing style and content (which we now know is a bad thing) it also seems "necessary" to create hugely complex and bloated embedded style sheets. This creates long load times and prevents central control over the websites style. To even further complicate things there is often a wild mix of html-formatting (align), element-local styles and style-sheet references. This makes it very hard to predict the outcome or even edit the styles in a consistent way. g

2.3. The Internet Publishing Chore Publishing to the Internet is a requirement for corporate business and non-profits. It's complex and costly to design Web-Sites that • look great, • navigate easily, • offer full access with a browser of choice, • and satisfy all standards.

2.3.1. Web-Site Usability A good web interface is based more in website usability than graphical rendering. A consistent visual design will provide the necessary orientation cues and navigational controls critically needed by users. A successful implementation of these website usability techniques will guarantee that your users will find the information they are looking for in the most efficient and pleasing manner.

Page 7 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

Unfortunately, maintaining a consistent look and feel is almost impossible when content editors are given WYSIWYG tools. Where one editor will use headings classes others will use bold.

2.4. The tools dilemma Most Web-Design tools were designed for marketing rather than publishing so they • tweak html until pages look good, • make graphic designers happy • fulfill advertising needs.

Unfortunately they usually do not • satisfy our criteria, • create good looking print versions

And easy maintenance is usually low on the list!

2.5. In Summary As a Result: • Most documents are prepared for publishing twice. • The costs are enormous! • The results are poor!

3. A Look At Alternatives

3.1. Time for a change! After 20 years of FYO (Format-Your-Own) it's time to look for alternatives based on the requirements of a modern publishing concept: • Authors know the meaning of their text! • Typesetting and Web design are professional tasks! • It makes sense to keep them separate!

Think about cost!

Page 8 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

We need to make professional typesetting and Web design affordable!

3.2. Multimedia Publishing Requirements This leads to three simple requirements: • Create all media from the same document. • Automatically apply formatting-templates. • Cut costs of professional typesetting and Web design

3.2.1. Aditional Requirements? It can certainly be argued that there are many more requirements in a publishing environment. However, these are the ones that are not adequately covered by current publishing tools.

3.3. Consider Single Source Publishing Single source publishing is about smart document processing: • Semantical markup adds meaning to text: Ferdinand Soethe [email protected] The author need not think about looks! • Applying professionally designed templates results in superior layouts.

3.3.1. Definition: Single source publishing From Wikipedia, the free encyclopedia: Single source publishing or single sourcing allows the same content to be used in different documents and in various formats. For example, a software company may have several products with user guides that share a common procedure, like instructions on how to open a file. Rather than maintain duplicate versions of this procedure (one in each manual) the manuals can share the content, perhaps flowing it into the document at the time of publication. Eliminating duplicate content can save translation costs, reduce maintenance costs, improve consistency and reduce errors.

Page 9 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

Single sourcing also allows the creation of documents in various formats from the same content. For example, a company might use the same content in online help, a printed document and a Web page. With a single source solution, the company only has to update the one source file for the content and regenerate the three outputs. Ideally, a single source solution does not require human intervention to customize the formatting or content for the various outputs. This can be difficult to achieve without the use of some kind of content management system.

3.4. Consider SSP (cont)... • Customizing the general layouts to create individual site design shares template cost among many users. • Applying different transformations, information can be served to multiple media.

3.4.1. Single Source Publishing To produce multiple documents from a single source it is necessary to be able to apply templates to the documents. These templates describe how the document should look. In order to do this it is necessary for the typesetting system to know what the content represents. For example, it must be able to tell the difference between a title of a book and the author of that book. This is known as "markup". Since an author will understand the semantics of the content they write it is possible for the author to apply a "semantic markup" to the content at the point of authoring. The author need not concern themselves with how this will be presented in the final published version. This enables the author to adopt editing tools that suit their needs as a content editor, they need not encumber themselves with graphical environments designed for typesetting. Just as the content author is freed from the limitations of a typesetting editor, the professional typesetter need no longer concern themselves with content and content editing tools. They need only address the typesetting of content that is correctly "marked up" with its semantic meaning. Finally, since the content and the layout is completely separate it is possible to apply different typesetting templates to the same content. This allows the content to be published in multiple formats and layouts. Each of these templates will create a version of the content that is "marked up" with presentational information appropriate for the desired usage of the content.

4. Facts about Forrest

Page 10 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

4.1. Behind the Scenes

Forrest Processing

4.1.1. Apache Forrest Apache Forrest is a publishing framework that transforms input from various sources into a unified presentation in one or more output formats. The modular and extensible plugin architecture is based on Apache Cocoon and relevant standards, which separates presentation from content. Forrest can generate static documents, or be used as a dynamic server, or be deployed by its automated facility. For more details see the Apache Forrest website (http://wforrest.apache.org).

4.2. SSP and Forrest The general concept of Single Source Publishing is not new. What's new is Forrest as a free-to-use framework for Single Source Publishing • With a focus on practical application,

Page 11 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

• quick and easy to get started and • extensible when you need it.

4.2.1. Apache Forrest: A Single Source Publishing Framework Forrest is far from a quick and dirty solution. Forrest is built on the world's leading XML application framework, Apache Cocoon, which provides advanced users with extremely powerful publishing capabilities. • Multiple task-specific source XML formats are provided (e.g. How-To, FAQ, change logs and todo lists supported natively), together with a range of non-XML formats including Apache xdocs xml format, plain html documents, some Wiki formats, a subset of DocBook, OpenOffice.org and MS Office. • Multiple output formats supported, for example HTML and PDF (using Apache FOP). • Native Scalable Vector Graphics rendering. Simply drop the SVG in the appropriate directory and it will be rendered as PNG. • Transparent inclusion and aggregation of external content, like RSS news feeds. • Full extensibility is possible, for example, databases queries, charting, web services integration. • Based on Java and XML standards, Forrest is platform-independent, making for a documentation system that is just as portable as the XML data that it processes.

4.3. How to get started • Install Forrest • Create a new project with Forrest • Start Forrest as a server • Write a document and add it as a new page • Show the results in the browser.

4.3.1. Getting Help to Get Started See the Apache Forrest website (http://wforrest.apache.org) and users mailing list for more details.

5. Forrest Publishing Features

Page 12 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

5.1. Forrest Publishing Features Forrest is full of powerful features right from the start! Out of the box you get • Automatic generation of Web and Print Media • Site-Navigation • Support for Accessibility Requirements • Flexible staging options

5.2. Automatic generation of Web and Print Media Forrest will automatically generate output in html- (Web) and -Format (Print) from a single source. In contrast to most legacy systems, this output • looks good, • prints well, • conforms to all important web-standards, • and is fully functional online and offline.

5.2.1. A Running Start Apache Forrest provides a running start in Single Source Publishing. It will generate a range of output formats together with any necessary usability content such as navigation menus in web sites, tables of contents in printed materials. It comes preconfigured with templates for HTML and Print output formats and will even manage the publication of your content to your chosen server.

5.3. Site-Navigation Tabs and menues for site-navigation are automatically generated from simple structure for a site that • looks good, • is easy to understand, • and is tested and works properly with most clients.

Page 13 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

5.4. Support for Accessibility Requirements Forrest already implements many requirements of accessibility guidlines that • require you to present information in a way that makes access independend of capabilities and features or your client device, • are or are about to become law in many parts of the world, • and a must for many corporate business sites.

5.5. Flexible staging options Forrest output can be presented in a variety of ways • Static HTML for cheap hosting and easy distribution on CD or DVD, • Application Server if you need powerful interactive features, • Build and deploy robot for time delayed updates of static sites

5.5.1. Delivery of Content Unique amongst comparable documentation tools, Forrest generates sites that can run both interactively as a dynamic web application, or as statically rendered pages. This is important since some publication formats are static by their nature, for example printed materials will always be static. However, the ability to serve in a dynamic environment enables realtime updating of sites when content is edited (including otherwise static documents, such as PDF). Furthermore, it provides a path for website growth: start off small and static, and if dynamic features (user login, forms processing, runtime data, site search etc) are one day needed, these can be accommodated by switching to webapp mode.

6. SSP Power in Detail

6.1. Single Source Publishing XXL Internally Forrest implements SSP in a 21st century software architecture: • Modular architecture with a flexible plugin concept • Separate presentation layer and separation of concerns • Commitment to platform independence and common standards

Page 14 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

6.2. Extensibility through PlugIns • Input-plugins to make use of files in many common data formats. • Output-plugins to present your content in various common formats. • Extend functionality with internal plugins to meet authors needs.

6.2.1. Extensibility Forrest, at its core, is nothing more than a framework for processing XML documents. However, it has a plugin mechanism that allows features to be added by simply installing any number of plugins. Plugins are installed simply by adding their name to your projects configuration. There are three types of plugin: • Input Plugins These read in a file of a specific format, for example, Docbook or MS Office and convert them to a format usable by Apache Forrest • Output Plugins These take the processed files output by Forrest core and convert them into some rendered format. For example XHTML, PDF or plain Text. • Internal Plugins Extend the internal processing of the data. Internal plugins perform functions like building site navigation structures.

The functionality of Apache Forrest can be extended almost limitlessly. The project maintains a list of Plugins (Apache Forrest website (http://wforrest.apache.org)) that have been complete or are in development. We are also aware of projects that have built plugins to do things such as execute database queries, read electronic sensors and perform periodic updates of sections of the content. Note: Plugins are a new feature of the most recent release of Forrest. The project is currently extracting functionality from the core into plugins in order to enable users to build the document publishing system that satisfies their individual needs. However, this is an ongoing process and at the time of writing some functionality scheduled to be moved into plugins still remains in core. This simply means that you may have some features you don't need as part of your initial download.

6.3. Skinnable Interface

Page 15 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

Forrest's keeps site design separate from content (separation of concerns) • Content authors don't have to worry about looks: Internet Technologies • Designers develop and test new looks without interfering with content creation. • New skins can be applied to all documents in a flash.

6.3.1. Real Life Skinning Applications Output formats can be "skinned" that is, the look and feel can be changed quickly and easily. As with plugins. The skins are external units of functionality and can be packaged and downloaded as and when needed. Alternatively, users can develop their own skins consistent with corporate branding.

6.3.1.1. Some Example Skins Developed By Forrest Users http://www.verit.de/

Page 16 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

verit skin http://www.xmlbelux.be/

Page 17 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

XML Belux skin http://www.dream-models.com

Page 18 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

Dream Models skin http://www.outerthought.net

Page 19 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

Outerthought skin http://cese.sourceforge.net

Page 20 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

CESE skin 7. Forrest for (Your) Business

7.1. Making the most of Forrest and SSP ... A few tips for a good start: • Forrest Licensing • Rules of Open Source • Getting Bug Fixes and Support • Is Forrest a Reliable Platform?

7.2. Forrest Licensing

Page 21 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

What does the mean? • You can use the Forrest free of charge! • It is OK to use for commercial projects (with very few limitations). • Your extensions and modifications will not have to become Open Source. • You could even create new commercial Software based on Forrest.

7.2.1. Apache Software License (Version 2.0) http://www.apache.org/licenses/LICENSE-2.0 (TXT or HTML) The 2.0 version of the Apache License was approved by the ASF in 2004. The goals of this license revision have been to reduce the number of frequently asked questions, to allow the license to be reusable without modification by any project (including non-ASF projects), to allow the license to be included by reference instead of listed in every file, to clarify the license on submission of contributions, to require a patent license on contributions that necessarily infringe the contributor's own patents, and to move comments regarding Apache and other inherited attribution notices to a location outside the license terms (the NOTICE file). The result is a license that is supposed to be compatible with other open source licenses while remaining true to the original goals of the Apache Group and supportive of collaborative development across both nonprofit and commercial organizations. The Apache Software Foundation is still trying to determine if this version of the Apache License is compatible with the GPL. All packages produced by the ASF are implicitly licensed under the Apache License, Version 2.0, unless otherwise explicitly stated. More developer documentation on how to apply the Apache License to your work can be found in Applying the Apache License, Version 2.0.

7.3. Rules of Open Source As Open Source development is a volunteer effort, it follows other rules than Closed Source Software: • You are free to use what is already there ... • and welcome to add what is missing, • or spend some money (saved on the license) to have someone else add it for you.

• Forrest - just like Slosed Source Software - has bugs ...

Page 22 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

• but we are committed to fixing them, • and have no sales department to push us into hasty releases.

7.4. Getting Bug Fixes and Support • There is no Forrest hotline ... • but a mailing lists (http://forrest.apache.org/mail-lists.html) to address any question You have about Forrest, • to discuss Your questions with the people who build forrest , • and a large helpful community.

• If a bug is bugging you ... • do something about it: help us fix it ... • and you will find a lot of people chipping in.

7.4.1. Contributing The Forrest Project is an Open Source (http://www.opensource.org/) volunteer project released under a very liberal license. This means there are many ways to contribute to the project - either with direct participation (coding, documenting, answering questions, proposing ideas, reporting bugs, suggesting bug-fixes, etc..) or by resource donations (money, time, publicity, hardware, software, conference presentations, speeches, etc...). To begin with, we suggest you to subscribe to the Forrest mailing lists (http://forrest.apache.org/mail-lists.html) (follow the link for information on how to subscribe and to access the mail list archives). Listen-in for a while, to hear how others make contributions. You can get your local working copy of the latest and greatest code (which you find in the Forrest module in the SVN code repository. Review the todo list, choose a task (or perhaps you have noticed something that needs patching). Make the changes, do the testing, generate a patch, and post to the developer mailing list. (Do not worry - the process is easy and explained on the Forrest website.) Document writers are usually the most wanted people so if you like to help but you're not familiar with the innermost technical details, don't worry: we have work for you! • How The Apache Software Foundation Works (http://www.apache.org/foundation/how-it-works.html)

Page 23 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved. Single Source Publishing with Apache Forrest

• Contributing to Forrest (http://forrest.apache.org/contrib.html) • Forrest mailing lists (http://forrest.apache.org/mail-lists.html)

7.5. A reliable platform? Trust Forrest with mission critical business applications if you follow a few simple rules: • Plan to spend some of the license fees saved • Don't assume that Open Source Software is easier to learn. • Take your time introducing Forrest. • Help us make Forrest better.

8. Thank You! • Presentation • Questions & Answers (20 min)

9. Time for Your Questions We have only 10 minutes • Try to keep it short • Give others a chance to ask there questions

For further information check out: • http://forrest.apache.org • http://forrest.apache.org/mail-lists.html

Page 24 Copyright © 2005 Ferdinand Soethe, Ross Gardler & Apache Software Foundation All rights reserved.