Using SVG as the Rendering Model for Structured and Graphically Complex Web Material Julius C. Mong and David F. Brailsford Electronic Publishing Research Group School of Computer Science & IT University of Nottingham Nottingham NG8 1BB, UK {jxm,dfb}@cs.nott.ac.uk ABSTRACT resulting bitmap graphics are resolution-dependent. The World This paper reports some experiments in using SVG (Scalable Wide Web Consortium (W3C), aware of the need for a flexible Vector Graphics), rather than the browser default of vector graphics format for the Web, set up a working group in (X)HTML/CSS, as a potential Web-based rendering technology, 1998, charged with drawing up draft proposals for Scalable in an attempt to create an approach that integrates the structural Vector Graphics (SVG) to be expressed in XML-compliant and display aspects of a Web document in a single XML- syntax. The advantages of vector graphics in terms of graphical compliant envelope. precision and scalability become very apparent in material such as maps, line-diagrams and block schematics. The involvement of Although the syntax of SVG is XML based, the semantics of the Adobe Systems Inc on the SVG working group did much to primitive graphic operations more closely resemble those of page ensure that although SVG was syntactically XML-compliant the description languages such as PostScript or PDF. The principal actual graphical semantics of its behaviour resembled, fairly usage of SVG, so far, is for inserting complex graphic material closely, those of its PostScript and PDF page description into Web pages that are predominantly controlled via (X)HTML languages. and CSS. SVG has predefined graphics elements, such as lines, rectangles The conversion of structured and unstructured PDF into SVG is and circles, which can be combined to create many simple discussed. It is found that unstructured PDF converts into pages of diagrams. For more sophisticated designs, operations are available SVG with few problems, but attempts to map the structural for specifying an individual object’s fill rule, colour space, components of a structured PDF — into an XML skeleton transformation matrix, etc. SVG is also capable of placing text underlying the corresponding SVG — are beset by difficulties. strings at arbitrary positions within the display of material on the These difficulties are not fundamentally syntactic; they arise screen. Indeed there is little to prevent it being used as a page largely because browsers are innately bound to (X)HTML/CSS as layout language, very much like PostScript itself. Furthermore, their default rendering model. Some suggestions are made for since SVG code is itself text based, it is possible for search ways in which SVG could be more totally integrated into browser engines to easily index and search within any SVG material, for functionality, with the possibility that future browsers might be example to extract city names from an SVG map of the world. able to use SVG as their default rendering paradigm. 2. INVESTIGATION Categories and Subject Descriptors Despite the fact that SVG is an application of XML it is an unusual case in that its tags describe low-level graphic primitives E.1 [Data]: Data Structures — Trees; I.7.2 [Document and Text rather than the abstract logical structure more generally modelled Processing]: Document Preparation — Markup languages; I.7.4 in other XML applications. Thus some other XML-based tagset [Document and Text Processing]: Electronic Publishing. has to be devised to delineate the higher-level document General Terms architecture that lies behind the presentational use of SVG. Algorithms, Documentation, Experimentation. With the help of all the characteristics and tools of XML, such as well-formedness, namespaces, schemas and DOM, document Keywords structure can be presented as a traversable tree, because the PDF, SVG, XML, vector graphics. structure of XML itself is fundamentally tree-oriented. With this in mind we have performed several experiments to investigate the 1. INTRODUCTION feasibility of using an SVG rendering mechanism, coupled with a Current technologies are routinely based around the simple XML-based tagset for document structure, to create an HTML (or, more recently, its cleaned-up XML- integrated and totally XML-based electronic document. compliant version called XHTML). For enhanced Web page styling that goes beyond the XHTML defaults there is the option We have created an Acrobat plug-in to carry out four levels of to use the Cascading Stylesheets (CSS) facilities. CSS can also be conversion from PDF to SVG. The first level is a direct used to style arbitrary XML-based documents within the current conversion of PDF (both structured and unstructured) content generation of Web browsers. streams into plain SVG, with one SVG document corresponding to each page of the original PDF and with no attempt being made From the outset, Web browsers have supported only simple to map any PDF structure tree that might be present. bitmap formats for graphical material, such as GIF and JPEG, despite the fact that raster files in these formats are large and the The second level iterates through the PDF structure tree and paginated single XML document. Each of the inserts would have incorporates into the SVG document all the PDF structure its own graphics state and could be scaled, animated, viewed elements, mapped as XML tags and named according to the PDF independently and reused elsewhere in the document. Standard Structure Types (sometimes referred to, informally, as the “Adobe Standard Tagset” [2]), while at the same time 3. FINDINGS extracting the marked-content sequences at every leaf of the PDF We carried out a number of conversion experiments on both tree. The resulting SVG document has custom tags at various unstructured and structured PDF documents. These conversions places which mark the logical structure using an XML version of involve direct extraction of content streams, iteration through the the Adobe Standard Tagset (see example below). We have written PDF structure tree, and the extraction of marked-content a Schema foe this custom tagset to be used as a validation sequences at each tree node. It was found that a PDF document, reference by browsers and XML parsers. Note that this XML with or without logical structure, can be converted into plain custom tagset has its own namespace in order to avoid SVG, with very few problems at the content stream level. interference with the SVG tagset when the two are intermixed in However a structured (Tagged) PDF may present some difficulties an SVG document. when converted at levels 2 and above (as defined in the previous section). For instance, a traversal of the structure tree in a correctly tagged PDF document should result in the correct Heading Level 1 reading order for the textual content, but this is not always aligned with the painting, or rendering, order of the various components. … Let us consider a PDF document which contains some text superimposed on a raster image. The image occurs after the paragraph of text in terms of a left-to-right traversal of the PDF The third level of our conversion experiments involved writing structure tree, but before the text in the linear ordering of the logically structured page contents into an external XML document content streams. Because the traversal of the logical structure tree labelled with the Adobe Standard Tagset, while the corresponding is in a depth-first (left-to-right and top-down) manner, and the SVG document contained simple SVG instructions to point at SVG painting model specifies that graphic elements are painted specific places in the XML structure document using as on top of, and hence obstructing, any previous ones already on the the referencing mechanism for relating content to document canvas then, when the PDF is converted via the approach of structure. iterating through the structure tree and extracting the graphic For example, text in SVG can be referenced as