A Structured Document Transformation Model

TransM: A structured document transformation model Nouhad Amaneddine ∗ Jean-Paul Bahsoun Jean-Paul Bodeveix Institut de Recherche en Informatique de Toulouse Université de Paul Sabatier, Toulouse, France {amaneddi, bahsoun, bodeveix}@irit.fr Abstract: We present in this paper a transformation model for structured documents. TransM is a new model that deals with specified documents, where the structure con- forms to a predefined data type. In particular TransM works on XML document conforming to a document type definition DTD. We present the different components of our model and we put the point on the kernel part of TransM which is the transformation rules. We show the strengths and the weaknesses of some recent transformation approaches and we prove that TransM can perform not only those simple and direct transformations but also complicated ones, especially, those who handle recursive ap- pearance in the structure of the modeled documents, where elements may appear at any level of deepness in the hierarchical composition of the document structure, whilst the problem is not clearly resolved in the existing transformation models. We propose TransM as a general model for structured document transformation and we explore in more details its specification core that carries out document transformations. 1 Introduction Nowadays, documents are produced in electronic forms and they are often displayed using different applications. Moreover, in the last recent years, the development of software engineering intended to separate the document structure from the document presentation [W303b][W303b]. Sharing documents between different systems is a very useful process in computer application. In some cases, a part of the document concerning particular information is needed by one application where the whole content is manipulated by another one [CI03][MR97]. Furthermore, document transformation shows an increasing attention of many research teems in the last few years [XF01][Pe03][Wi03]. Consequently, a dedicated language for specifying transformations should be defined. It may contain transformation rules that de- scribe relationships between a source structure and a destination structure, and which can be used to generate a target structured document instance conforming to a predefined document type. The source and the destination model could be instances of the same document structure definition, or, they may conform to two different document structures. ∗Lebanese university, faculty of Sciences 53 Structured document transformations relies on several modules. The core starts at the transformation language based on rules or expressions. Then strategies for the traversal of the input document must be defined. Advanced conditions and constraints may be applied in order to filter selected information. Finally the writing policy is used to place correspondent elements in the output structure. Some models utilize their own definition languages to carry out the transformations [OM02], some other prefer to generate code for specialized language rather than using a general one [Pe03]. Without forgetting the data manipulation and the application of particular functions on document content in case of needs. Consequently, many modules are used in order to accomplish all the steps of the transformation process. The chosen strategy is built according to the selected document structure. As XML [W303b] became the standard markup language for structured document, a tree structure is preferred as the backbone structure for documents. Even if it has a tree form, an XML document can be considered as a graph. We have in XML the possibility to represent graphs. This can be performed by referring the destination node with the attribute of the source one. Eventually, a unique identity should be chosen for each node in the document thus it can be referred by its identity. This paper presents a new model for structured documents transformations, named TransM. We show the specification of the kernel part of TransM which is the transformation rules and we present the general algorithm that holds the transformation process. The paper is organized as follows: in section two we present an overview on structured document and their frameworks. We detailed TransM in section three. We present in section four some recent related works then we conclude in section five by a perspective for our future work. 2 Structured documents and transformation overview A document is considered to be structured when it contains implicitly or explicitly ad- ditional information about its hierarchical composition [AB03][XF01]. Many types of documents are considered as structured ones, as those applying pre-defined grammars that can be a context free grammar or any other type of grammars [Mu98]. The structure of the document is the backbone of its internal representation. It is the hidden side of the exter- nal representation that is created conforming to the internal one and it is presented to the user of the application. The most important representation of a structured document is a representation that contains only the actual content of the document in the structured form. Structured document can be a document with a tree structure or a graph one, or it can be a document conforming to some constraints defined in another form than a grammar. XML documents conforming to some Document Type Definition are such examples. When we specify the transformation, we indicate how the rules should be applied and what they do mean. Direct transformation are made when we have a simple transformation process and we suppose that we will not use the same transformation another time. Otherwise it is more wise to define the rules that we need in order to gain from the reusabil- 54 ity characteristic of the specifications. The recent research results are focusing on model transformation [Wi03][PR02]. We remark that they differ in the literature between document transformations and model transformations, while the two research areas can be placed under one single title which is the structured document transformation. The QVT approach [OM02] is one of those proposals that works on model transformation. For us, while we can represent any object to be transformed in a structural form, the resolution of the transformation will count on the general axis defined by the transformation of documents. The main problem that we could mention in the used transformation method is that most all of those methods are efficient when the transformation process is not complicated. This case can be seen repeatedly in those transformations where both the input and output structures are so closed and when there are no manipulations to be performed on the transformed information. In this case, simple transformations are sufficient and we see useful rules and efficient schemas with a very easy and comprehensive way are used to accomplish the transformation. The problem starts when the structures of both input and output documents are quite different and when we need to make changes on the contents in order to produce the required document. In this case, we found either the used method can hold those transformations but they are too complicated to be understood and applied, and their use does not cover all the complicated situation we could face in such transformation, or, those methods are unable to support advanced transformations. In this case, the eventual question that could be asked is what is the interest of providing such a method and how relevant is its application, especially for a delicate transformation process. We present in the next section the TransM model that can support not only those direct and simple transformations but also more complicated ones. 3 Core of TransM transformation model Before exploring our transformation rules which represent the kernel of the TransM, we present the eXtensible Style Sheet Translation language. We show the different steps of the transformation process then we introduce our model justifying the choice of generating XSLT code. 3.1 XSLT The W3C proposed XSLT as a transformation language for XML documents [W303b]. This language has considerable computation power. A key element of XSLT is the sub- language of patterns, which is used to match and select elements. The pattern language of XSLT has evolved into Xpath [W303b], a language for selecting nodes of a tree. It performs the core functions of XSLT. XSLT is a recursive XML transformation language 55 and an XSLT program can be thought of as an ordered collection of templates. Each template has an associated pattern and contains a nested set of construction rules. A template processes nodes that match the selected pattern and constructs output according to the construction rules. The transformation starts at the root of the input document and the construction rules specify, by means of construction patterns, where in the XML document the transformation process should continue. However, XSLT requires detailed and tedious programming to accomplish complex structure transformations. As alternative, we con- struct our transformation rules at the structure level. An XSLT program will be generated automatically in order to perform transformations on the instance level. 3.2 Transformation process The process consists of transforming XML documents conforming to a predefined document type definition to another XML document conforming to a different document type definition. It starts by making the association relations between the input and the output structures as shown in figure 1. Those associations are chosen from a set of pre-defined possible associations. Then the transformations are specified by a set of rules. Those rules are used to hold the required transformations in the concrete level. The XSLT program generated after the transformation rules aimed to transform document conforming to the input structure to another document conforming to the output structure. A A B C D C X f(D) E F G g(B,F,G) Y Z Input structure Output structure Figure 1: Elements Associations In this figure we show the input structure and the output structure with a tree forms.

Load more