Journal of Computing and Information Technology - CIT 14, 2006, 2, 161–174 161 doi:10.2498/cit.2006.02.07

Transclusions in an HTML-Based Environment*

Josef Kolbitsch1 and Hermann Maurer2 1 Graz University of Technology, Austria 2Institute for Information Systems and Computer Media, Graz University of Technology, Austria

Transclusions are an advanced technique for the inclusion remote document B that is virtually included

of existing content into new documents without the need  into document A seeFigure1 . to duplicate it. Although originally described in the early 1960s, transclusions have still not been made available to users and authors on the world wide web. This paper describes the prototype implementation of a system that allows users to write articles that may contain transclusions. The system offers a simple web-based interface where users can compose new articles. With a simple button the user has the ability to insert a transclu- sion from any HTML page available on the world wide web. While other approaches introduce new markups for the HTML specification, make use of technologies such as XML and XLink or employ authoring systems that internally support transclusions and can generate web pages as output, this implementation solely relies on the techniques provided by an HTML-based environment. Therefore HTML, Javascript, the Document Object Model, CGI scripts and HTTP are the core technologies utilised in the prototype.

Keywords: , transclusions, xanalogical struc- ture, authoring systems, publishing systems, web-based applications.

Fig. 1. Exemplary transclusion. Part of document B

  1. Introduction top right is transcluded into document A top left . Bottom: the source code of document A does not contain the actual text of document 2, but only the data required to retrieve it from the original document. In 1965, presented “a file structure for the complex, the changing and the indeter-

minate”, in which he introduced the term hyper-

  text see 20 . One of the fundamental concepts The following sub-sections discuss the back- in Nelson’s notion of hypertext is a technique ground of transclusions and their use in HTML. called transclusions. Transclusions allow au- Section 2 addresses attempts to implement tran- thors to include portions of existing documents sclusion. Our design and implementation of into their own articles without duplicating them. transclusions in HTML-based environments are Basically, a transclusion in document A is a ref- detailed in section 3. Issues that were encoun- erence to a portion of the content of a potentially tered during the implementation are described

This paper was supported by the Styria Professorship for Revolutionary Media Technologies. 162 Transclusions in an HTML-Based Environment in section 4. Finally, section 5 discusses several Apart from obvious improvements in author- aspects of our implementation. ing and publishing systems, transclusions can also offer a solution to copyright issues expe- rienced today on the world wide web: authors 1.1. Background of Transclusions include content into their documents by means of transclusions. Whenever a reader views a Transclusions are designed as complete replace- transclusion, a note about the rights associated ment for all cut-and-paste mechanisms in use. with the transcluded content is added, and a mi-

Nelson argues that cut-and-paste is not what cropayment is made to the corresponding owner 

people actually want to do but that it is a re-  25 . Nelson names this model transcopyright

  striction imposed upon authors by the nature of see 24 . paper. Writers actually do not want to make a copy of an existing document, cut out the piece they want to reuse and paste it in their docu- 1.3. Transclusions and HTML ment. They want to include the original content

and let readers know what the source and the

 

The Hypertext Markup Language HTML, 7

  context of the quote is e. g. , 21 . is a relatively simple language for describing Reference lists at the end of a scientific publi- platform-independent hypertext pages. In the cation, for instance, are usually not what is in- early stages of its development, the focus of tended by writers and desired by readers. They HTML was on style and graphical presenta- are rather a pragmatic solution to the problem tion rather than on functionality and underlying that both the source and the context of the quo- paradigms. Therefore, many innovative ideas tation are lost by copying-and-pasting a portion such as bidirectional and issues al- of content printed on paper. ready known at that time including broken links

were not considered in the implementation e.

What used to be physical restrictions of paper  g. ,  27 . was embraced by most computing systems in an

attempt to resemble the work environments and In principle, transclusions are used in HTML.

  object common processes in offices cf., 33 . There- Designated markups including img,

fore most current graphical operating systems and embed incorporate content such as im- make use of metaphors such as a desktop, fold- ages, Java applets and animations into HTML ers and documents; a document has to be put pages by means of linking. Thus, these el- in exactly one folder; there is a clipboard, and ements basically make use of the concept of

content from a different document is included transclusions.

  using copy-and-paste mechanisms see 23 . Transclusions in HTML are very limited,though. Only certain media such as images can be virtu- 1.2. Implications of Transclusions ally included, whereas textual content, in gen- eral, cannot be transcluded. Moreover, the tran- Transclusions are, however, not only a mere re- sclusion mechanisms available in HTML can placement for copy-and-paste. They assure that only be applied to entire documents. Fine- the original context of a quotation is preserved grained transclusions such as a small spatial and can provide a visible link to the source of selection of an image are not implemented. the transclusion. Ted Nelson’s approach to real-

ising this functionality is based on transpointing

  windows e. g. , 22 . 2. Attempts to Implement Transclusions Moreover, authors of documents can be noti- fied when their articles are transcluded. Thus Although the idea of transclusions was proposed they can, for instance, find out about other some forty years ago, only a few attempts to researchers in the same area. Authors using implement this advanced technique have been transclusions, on the other hand, can be in- made. The following sections give an overview

formed automatically about modifications in of several notable approaches to the realisation

  source documents see 14 . of transclusions. Transclusions in an HTML-Based Environment 163

2.1. Xanadu Although the proposal seems rational, it has not been accepted, and no web browser to date has Transclusions are an integral part of Xanadu, the feature implemented.

Ted Nelson’s original hypertext system e. g. ,   21 . Their implementation relies on a docu- ment model, though, that is radically different 2.3. Transclusions with IFrames and

from what is widely used today. In Xanadu, Embedded Objects  documents versions do not contain content but references to the actual content. Content is both The recommendation for HTML 4 includes

stored and referenced with the highest granular- markups for inline frames and embedded ob-

  ity possible – on the level of single characters. jects e. g. , 7 . Both inline frames and embed- All content is retained in content repositories. ded objects define areas within a given HTML document that can be used to display potentially Any document is made up of a list of references remote resources. Inline frames can merely to content stored in the system, e. g. , a document contain HTML pages and images, whereas ob- consists of “characters 124 to 729 and 1276 to jects may contain resources of arbitrary type. 1301 from the repository”. When content from document B is transcluded into document A, the Transcluding document A into document B can, corresponding references to the actual content for instance, be achieved by inserting an ob

in the repositories are added to the reference list ject tag with a reference to document A into  of document A. document B  16 . The capabilities of this technique are rather limited, though. Only en- Thus, the creation and retrieval of transclusions tire documents can be referenced. Moreover, in Xanadu are trivial list operations. Ted Nel- the context is lost because a link from the doc- son also details a number of functions related ument containing the embedded object to the to transclusions and the handling of situations source of the transclusion is not provided by

in which documents are modified or large por- these markups. Therefore this approach is not

  tions of documents are deleted e. g. , 25 . well suited for realising transclusions. Basically, these functions can be seen as more complex list operations. 2.4. XML-Based Transclusions

2.2. Proposal to Amend the HTML The Extensible Markup Language, XML, is

Specification a flexible language for describing documents

  that contain structured information see 31 . Since the idea of transclusions is already present In contrast to other markup languages such as in HTML for media such as images and multi- HTML, where both syntax and semantics are media animations, a sensible approach to text- determined, neither a set of tags nor the seman-

based transclusions is to introduce a new tag that tics are defined in XML. Therefore, XML per  allows users to transclude text. Therefore,  28 se does not contain a distinct markup for links; suggests an amendment to the HTML specifica- a separate linking language is used instead.

tion. A new markup, text, is proposed with

  the intention to offer an element of the same XLink, the XML Linking Language e.g., 30 ,

provides a framework for describing the syntax embed significance as img or . and semantics of even complex linking struc- The main attributes of the markup are the URI tures between resources. An XLink link typ- of the source document and the start position ically contains a number of attributes that de- and length of the text to be transcluded. The scribe, for instance, what resource is to be web browser analyses the tag, loads the source loaded and when it is to be displayed. document of the transclusion, extracts the por- Three attributes are essential for the implemen- tion of text given by the attributes of the text tation of transclusions using XLink:

tag, and inserts it into the document. Thus, a href transclusion is handled in a similar way as an the document to be loaded. Set to the inline image. source document of the transclusion;

164 Transclusions in an HTML-Based Environment actuate when the resource is to be loaded. facilitates the reuse of small pieces of informa-

When set to onLoad, the resource is loaded tion for a number of lessons or for students with when the document containing the XLink various differing standards of knowledge. link is loaded;

At the University of Bologna, Italy, researchers show in which manner the resource is to be attempt to combine existing software products

displayed. When set to embed, the resource such as the Internet Explorer and Microsoft

is displayed practically instead of the XLink Word to offer a collaborative editing environ-

  tag. ment for the world wide web see 3 . The im- plemented tool, XanaWord, allows users to edit The skeleton of the XLink link shown in listing any web page they view in their web browser – 1 transcludes the entire document source even if they do not have write permissions for into the document containing the link at the po- the resource.

sition of the link. A similar approach to tran-  sclusions in XML is described by  29 . Any page displayed in Internet Explorer can be opened with a word processor such as Mi- crosoft Word, where the user can make arbi- P\WUDQVFOXVLRQ trary changes to the document. When the user [POQVP\ ´KWWSZZZNROELWVFKRUJ´ saves the page, only the changes to the original [POQV[OLQN ´KWWSZZZZRUJ[OLQN´ [OLQNW\SH ´VLPSOH´ document are stored in the XanaWord reposi- [OLQNKUHI ´VRXUFH[PO´ tory. Whenever a document is retrieved from the [OLQNDFWXDWH ´RQ/RDG´ repository, the modifications made by the user [OLQNVKRZ ´HPEHG´! and the content from the original resource are included in a dynamically generated document Listing 1. Fragmentary transclusion with XLink. by means of transclusion. Finally, the dynamic document is sent to the user’s web browser. The Institute for Information Systems and Com-

With the XML Pointer Language XPointer, puter Media in Graz, Austria, proposed an envi-   32 fragments of XML documents can be iden- ronment capable of handling transclusions in

tified and addressed as well. Thus, a combi-

  various output document formats see 14 . nation of XML, XLink and XPointer can be The system includes three components: employed to make the use of fine-grained tran-

sclusions in XML-based environments possible the Latex typesetting system that allows users

  see 16; 15 . to create documents and save them in a num- ber of document formats including Postscript, PDF and HTML; 2.5. Recent Projects Involving Transclusions an extension to Latex that allows users to create transclusions; and

Currently, several mostly academic projects that

  experiment with transclusions exist. The fol- a Hyperwave Information Server see 9 lowing paragraphs introduce three selected sys- that handles issues such as linking and ver- tems. sioning. The University of Nottingham, UK, has pro- In the proposed environment users can insert a posed a technology-based learning environment special markup that designates a transclusion in

Latex documents. Then, the user has to upload

  that adapts to its users see 19 . The informa- tion retained in the system is organised in small the file to a Hyperwave Information Server that “chunks” that are stored as XML files. extracts links and saves them in a link database, etc. When the document is requested by a user, Since the system is adaptive, lessons are not the transclusions and links are inserted into the static but assembled dynamically on the basis of file saved on the server. The resulting interme- a lesson plan. When a user requests a particular diate file is processed by Latex in order to gener- lesson, appropriate chunks of information are ate the requested document format. Ultimately, retrieved and included into a virtual document the document containing the transclusion is sent by means of transclusion. Thus, the system to the client. Transclusions in an HTML-Based Environment 165

3. Implementation HTTP: documents containing transclusions

are transmitted to the readers using the Hy-

  pertext Transport Protocol In contrast to several approaches to transclu- see 8 . sions illustrated above, this project does not present a proposal but an actual implementa- 3.2. System Overview tion of a system that lets users take advantage of transclusions. It is designed as part of a larger system that offers communities instruments to Since the system for creating and retrieving work actively with content from digital libraries transclusions consists of a number of compo-

nents, a brief overview is given. The following

  and electronic encyclopaedias see 12 . Thus, a prototype is implemented offering a tool for description is made in the order of actions taken authoring new articles that can contain transclu- by a user in authoring and reading a document

including transclusions.  sions. It is available online at  11 . Two fundamental actions can be distinguished in the system: the creation of a transclusion 3.1. Design Goals and Requirements when the article is authored, and its evalua- tion when the page containing the transclusion The environment for creating and retrieving is to be displayed. When the user wants to transclusions aims at facilitating the reuse of create an article with a transclusion, the web information readily available on the Web – even browser presents a frameset with two frames. by novice users. Therefore a number of design One contains a conventional area for au- goals have to be taken into consideration: thoring HTML content and an additional button for adding a transclusion. When the user presses

ease of use: the tool for making transclu- this button, the URL of a page can be entered, sions must be as easy to use as traditional an the corresponding page is loaded through an copy-and-paste mechanisms; HTTP proxy application into the second frame.

use of any document on the web: not only The user can either transclude content from this documents from a closed repository but ba- page or can use the second frame to browse to a sically any web page may be the source of a different page – again through the HTTP proxy transclusion; application. The document to be transcluded is complemented with a button that inserts the

level of granularity: any portion of text may transclusion into the text area of the first frame, be transcluded from a document, from a sin- when pressed. An illustration of the two frames gle character to the entire content of a page. is given in Figure 5. Browser plug-ins or special software tools should The user selects the portion of text to be tran- not be required. Therefore, this implementation scluded in the second frame and presses the of transclusions solely relies on technologies button to have the transclusion actually inserted available and widely utilised on the world wide into the article. The button calls a Javascript web: function that determines the start and end posi-

HTML: transclusions can be made from any tions of the selection made. Together with the HTML formatted document available on the URL of the page in the second frame, these val-

web. Moreover, documents containing tran- ues are used to generate an intermediate markup 

sclusions are presented to the reader as tra- that is inserted into the article see section 3.4 .

  ditional HTML documents see 7 ; Once a user has finished authoring an article and

Javascript, DOM: internally, most current chooses to save the new document, the contents web browsers represent HTML pages as of the text area including the intermediate tran-

trees of objects. The underlying technology sclusion markup are sent to a CGI script on the  is the Document Object Model DOM, 4; server. The server stores the “static” text and

5 . Javascript is used to access individual the values provided through the transclusion tag

objects in the DOM tree of the HTML page to a database. In addition to this, metadata on

  to be transcluded see 26; 6 and enables the source of the transclusion is collected and fine-grained transclusions; stored in the database. 166 Transclusions in an HTML-Based Environment

Whenever the article containing the transclusion page to be transcluded. This programme is ba- is requested, a second CGI script is invoked. sically necessary to insert a button and a small

The script retrieves the contents of the article portion of Javascript code into the correspond-  and the parameters of the transclusion tag from ing page see section 4.1 . It relies on a spe- the database. The parameters defining the tran- cialised, non-transparent proxy application de- sclusion are used to load the original page from veloped for this project. its original location. If it is unchanged, the tran- scluded portion of the text is extracted from the In the current prototype implementation the re- original page, combined with the static text and lational database consists of only two tables.

sent to the web browser of the client see Figure While one table contains the static content of

5. the article, the other one stores detailed infor- The next section gives a brief introduction to mation on the transclusion as well as a rich set the architecture of the implementation and its of metadata and a fingerprint of the source doc- components. ument.

3.3. System Architecture 3.4. Creating a Transclusion

Our implementation of transclusions follows As explained above, the interface for authoring a classic client-server paradigm. A conven- new articles consists of a frameset with a frame tional HTTP server, a relational database, sev- for writing an article in an HTML form and a eral server-side CGI programs, a non-transpa- separate frame for displaying the content to be

rent HTTP proxy application and client-side  transcluded Figure 5 . When users wish to in- Javascript code are the main components of the sert a transclusion they select the portion of text system. with the pointer device and click on the button provided in the window. The button calls a Javascript function which is essentially the only operation carried out on the client computer. It accesses the document ob- ject model to determine the exact start and end positions of the selection made by the user and generates an intermediate tag that is inserted into the article. The Javascript interface to se- lections provided by most browsers is somewhat peculiar in that it determines these start and end Fig. 2. Overview of the server-side components. positions in the way the user actu ally marked the text. I. e. , the anchor of the selection is the position where the user clicked to indicate the The CGI script “Create and Store” in Figure 2 beginning of the selection. Then, the user drags receives the data submitted by users. It analyses the mouse, for instance, to the end of the se- the content of the article, extracts transclusions lection and releases the mouse button to denote

and stores both content and transclusions in the  internal database of the system see section 3.4 . The “Extract and Merge” script, on the other hand, reads the content of an article together with the information on the transclusion from the database, fetches the source document of

the transclusion, assembles the complete article  and sends it to the user see section 3.5 .

The third CGI script, “Fetch External”, is utilised Listing 2. Syntax of an intermediate transclusion tag

  during the authoring process for loading the top and an example bottom . Transclusions in an HTML-Based Environment 167

the end of the selection. The end position is the and the “Create and Store” CGI program is in-  focus. In the following paragraphs, anchor and voked seeFigure3 . It extracts the transclusion focus are denoted by prefixes “a” and “f”. from the article, determines the attributes of the transclusion and writes the information to the

The syntax of the newly introduced transclu

database. The transclusion markup in the

sion markup with its seven parameters is rather original article is replaced with a transclusion  complex see listing 2 . This level of detail is object that refers to the transclusion stored in required to be able to determine the exact start the database. and end positions of transclusions, though. Val- ues in curly braces describe the type of attribute The source document of the transclusion is not values: stored in the internal repository. However, its

URL, the creation and modification dates as src : the URL of the document to be tran- well as an MD5 hash value of the entire page scluded; content are retained as fingerprint. These values

are necessary to determine if the source docu-

atag ftag , : the names of the tags in which the transclusion starts and ends, e. g. , “P” ment has changed when the transclusion is re-

for a paragraph; trieved. 

It should be noted that, in contrast to  28 ,where

aindex findex , : the index of the tags in an amendment to the HTML specification is which the transclusion starts and ends, e. g.,

suggested, the transclusion markup in this the seventh paragraph in the document;

implementation is only used during the author-

aoffset foffset , : the offset within the start ing process. It is inserted when the user makes a and end tags, e.g., the transclusion starts at transclusion, is evaluated by the system and re- the second character of the seventh para- placed by a transclusion object. When the page graph in the document. containing the transclusion is to be displayed, the transclusion object is replaced with the cor- The exemplary tag shown in listing 2, for in-

responding content from the original page see

stance, describes a transclusion that starts at the transclusion below. Hence, the tag is only first character of the second H1 heading and visible within the system but not externally to ends at the 30th character of the fifth paragraph the user. in the given document. When the article containing the transclusion is 3.5. Retrieving a Transclusion saved by the user, the data is sent to the server, The “logic” of our implementation mainly lies within the component that retrieves transclu-

New Article Existing HTML sions. Whenever an article is requested, its Document Lorem ipsum dolor sit amet. Curabitur ultrices. body is analysed for the presence of transclu- Include Information Javascript Quisque ac ligula. Consectetuer Suspendisse wise sion objects. For each transclusion object the adipiscing elit on Transclusions elit, tincidunt pelletesque tempus, rhoncus following steps have to be carried out see also curruminus. et, tortor. Remote Client Server Figure 4:

resolve the object and retrieve the informa- Submit form containing transclusion tag tion on both the transclusion and on the source document from the database;

CGI Script CGI Script (Create & Store) (Extract & Merge) check if the given URL of the source docu- ment can be loaded;

if it can be retrieved, check if the metadata, "Static" Text Transclusions i.e., the creation and modification dates as Server well as the MD5 hash values, have changed;

Fig. 3. Simplified scematic illustration of the process of if the fingerprint of the source document is creating a new article containing a transclusion. valid, retrieve the resource and extract the 168 Transclusions in an HTML-Based Environment

transcluded content. In Figure 5, transcluded

New Article Existing HTML text is highlighted using a light-gray background. Document Lorem ipsum dolor sit amet. Curabor ultrices. Transclusions are complemented with a hyper- Curabor ultrices. Quisque ac ligula. Consectetuer Suspendisse wise link to the original source of the content. adipiscing elit elit, tincidunt pelletesque tempus, rhoncus curruminus. et, tortor. Remote Client Server 4. Issues Encountered Send the document with the transcluded text to the client

Get original During the implementation and evaluation of the document CGI Script CGI Script (Create & Store) (Extract & Merge) prototype a number of difficulties were experi- enced. A few substantial issues are addressed in the following sections.

"Static" Text Transclusions

Server 4.1. Javascript Restrictions

Fig. 4. Handling a request for a page containing a  transclusion simplification . As described in section 3.4, our implementation relies on Javascript code that detects which por- tions of a document are selected by the user; when the user presses a button, the start and end positions of the selection are determined. Restrictions imposed by the security mecha-

nisms of most modern web browsers e. g. ,   18 prevent Javascript functions from access- ing selections in “foreign” frames and docu- ments. This means that the button that reads the user’s selection has to be present in the same frame as the selection. Since our premise was that transclusions can be made from any HTML document on the Web, we have to make sure that the Javascript code required is inserted in any page the user wants to transclude. The approach in the current im- plementation is to use a non-transparent proxy application. So when users enter the URL of the page they wish to transclude, the page is not Fig. 5. Screenshots from a prototype of transclusions in loaded directly by the web browser, but by a an HTML-based authoring environment. CGI script on the server that acts as an HTTP proxy. The CGI script appends the demanded Javascript code and sends the document to the portion of text determined by the start and client. end positions of the transclusion; The proxy application could be omitted if tran-

replace the transclusion object in the article sclusions were only made in documents from with the transcluded content; an internal repository such as an online journal or a content management system. The system

if any of the operations above fails, insert an generating the documents could automatically apologetic error message. insert the essential Javascript code when the re- Every transclusion is formatted in a way that source is requested, for example, with a partic- readers can distinguish between authentic and ular parameter. Transclusions in an HTML-Based Environment 169

4.2. Browser Specific Implementation large archive of currently about forty billion web pages, and local caching. When a transclusion The function for accessing the user’s selection is requested whose source document has un- poses yet another problem. Different imple- dergone major changes or has become unavail- mentations of the corresponding function ex- able, the Wayback machine is queried for the ist in the various web browsers available to- resource. The query includes the URL and the

day. The Mozilla family accesses the selection creation date of the transclusion as access date. on through the documentgetSelecti method, Alternatively, a local cache or Cache

whereas Internet Explorer, for instance, uses a can be employed. Local caching means that a

dedicated documentselection object e. g. , copy of a resource has to be made when it is 

 10 . transcluded; the local copy is retained in an in-

ion

Due to the use of the documentgetSelect ternal repository of the system. In case of local  method in our implementation, the prototype is caches, however, legal issues may arise.  1 , only compatible with Mozilla-based browsers. for instance, discusses whether services such With minor modifications in the client-side as Google Cache are in conflict with German Javascript code, however, the prototype should copyright laws. work with a wide range of web browsers includ- Figure 6 illustrates the suggested retrieval strat- ing Internet Explorer. egy for transclusions: when the original source of a transclusion is available and has not been 4.3. Modified Documents and Unavailable changed, it is retrieved from the original loca- Resources tion. Otherwise an attempt is made to load the page from the Wayback Machine or from a sim- Similar to broken links in web pages,documents ilar cache. If this attempt fails as well, the user that are modified and resources that become un- available can pose a problem for transclusions. One reason for this deficiency is the use of Uni-

form Resource Locators on the world wide web

  , 2 . URLs identify an object and describe its phys- ical location. Defining the physical location of a document determines that only one instance of the document may exist at a time. Differ- ent resource identification and allocation mech- anisms allow for multiple locations of the same document, i. e. , several instances of the same document may exist in different physical loca- tions. When a resource with a certain object

identifier is requested, it is retrieved from one

  of the locations that retain a copy e. g. 27 . This can, for instance, be the location with the fastest network connection, the one with the lowest load, or the one with the shortest dis- tance. The design of Xanadu takes a similar approach,

in which a resource may exist in several loca-

  tions e. g. , 25 . Thus, when a transclusion is requested and one instance of the source data be- comes unavailable, it is retrieved from another repository containing the same information. Fig. 6. Flow chart of a document retrieval strategy

We propose an analogue mechanism that makes where the source document of a transclusion can

  use of the Wayback Machine e. g. , 13 ,avery become unavailable or can be modified. 170 Transclusions in an HTML-Based Environment is notified that the transclusion cannot be made recent articles. Although the actual content of at this time. the document is not altered, the system com- ponent that analyses the state of transclusion sources would detect a modification. 5. Discussion It is desirable to have modifications in docu- ments and their importance – was only an ad- The implementation of transclusions in a purely vertisement changed or has the meaning of the HTML-based environment has shown interest- article changed – detected automatically. How-

ing perspectives, and various aspects need to ever, this functionality is presently computa-  be investigated in detail and require further re- tionally not feasible.  14 suggests leaving the search. A few selected topics are pointed out in decision to the user: despite the modifications the following sections. the content from the altered source document is transcluded, and the user has to determine if the transclusion and the context are still appropri- 5.1. Robustness ate. In any case, authors of modified transclusion The prototype we presented offers ease of use sources should be notified that the content they and relative overall stability. The robustness, virtually included into their articles might not be however, can still be improved. Under certain suitable anymore, and that it has to be reviewed. conditions, for example, transclusions can be imprecise. Content transcluded from a docu- ment by the “Extract and Merge” component 5.2. Aspects of the Design can be slightly different from what a user orig- inally selected – a few characters too many or The design of the implementation, the use of a too little are extracted. transclusion object in particular, open up excit- An issue that generally affects the robustness ing opportunities. Since the transclusion object of our implementation and demands in-depth is associated directly with the source document analysis is modified content. The shortcoming of a transclusion, it is possible to determine partly arises from an optimization that improves which other articles in our system include con- the system performance. When modifications tent from the same source. This information in the source document of a transclusion are indicates that the corresponding articles might to be detected, only the creation and modifi- deal with a similar topic and that they could be cation dates as well as the content length in the of interest for both authors and readers. More HTTP header of the resource are scanned. Some importantly, this information denotes hat the au- servers do not return these values at all, though, thors of these articles might work in a similar and a small percentage of hosts return invalid area. In a scientific setting, for instance, these date values. So if creation and modification authors can be researchers working on similar dates or the content length are not available, the projects. Thus, information exchange can be entire resource is retrieved and an MD5 hash enabled. From a more general perspective, col- value is generated. When the content to be re- laboration can be fostered and organisational

trieved is very large, the system load is high or knowledge management can be facilitated e. g.  the network connection is slow, it might take too ,  17 . long to calculate the hash value. In this case, the Therefore we propose a simple function like process might terminate with a time-out signal, Which other articles transclude the same doc- and the transclusion cannot be made. ument? or Who else uses the same document? As pointed out above, modifications in transclu- that can help readers and writers discover new sion sources are a general problem. Especially information. dynamic content such as pages from a content This principle can be applied in the “opposite management system or from a digital library can direction” as well. Authors can easily find out be critical. In many cases, these documents con- which other articles in the system transclude tain advertisements or other frequently chang- the articles they produced. This information ing information such as references to the most can basically be used for the same purposes as Transclusions in an HTML-Based Environment 171 pointed out above. Hence, we propose another Although a number of issues were encountered function that complements every article in the during the implementation phase, we have been system: Which articles in the system transclude capable of collecting valuable results that make this article or simply put, Who transcludes ‘us’? us confident that we can enhance the current prototype and increase its robustness and stabil- In a more sophisticated approach, the system ity. could pro-actively point out resources and au- thors that are related to the article being dis- The innovative design of the transclusion struc- played. tures, as well as the architecture of system components, open up new perspectives and can lead to more advanced functionality. Facilitat- 5.3. Aspects of the Proxy Application ing information discovery, pro-active dissemi- nation of related content and the stimulation of Our implementation relies on a non-transparent community-building are only a few possibilities HTTP proxy application that makes it possible among others. to insert a small portion of Javascript code into every page the user wishes to transclude. Al- With the infrastructure provided by the proxy though the application was initially intended for application elaborate functionality such as au- a very specific purpose, its design is so flexible tomatic highlighting of information, dynamic that a whole range of other, largely unrelated content adaptation and on-the-fly insertion of applications become feasible. related data can be introduced. This concept can prove to be beneficial in numerous environ- Blacklisting of words and hyperlinks, highlight- ments including electronic encyclopaedias and ing of text and dynamic insertion of annota- digital libraries. tions are just a few simple examples. Advanced techniques may include dynamic adaptation of These ideas will be discussed thoroughly in a content, on-the-fly insertion of complementing future paper. information, etc. These novel ideas need to be explored in detail. 7. Acknowledgements A comprehensive analysis and an evaluation of early results will be presented in a forthcoming publication. The first author would like to thank Edmund Haselwanter for his valuable input on the Way- back Machine. 6. Conclusion References This paper briefly outlined Ted Nelson’s notion of hypertext and one of its prime concepts –

transclusions. Although HTML has been influ-   1 M. BAHR, The Wayback Machine und Google Cache

enced by the notion of transclusions for the in- – eine Verletzung deutschen Urheberrechts?

deaufsatzhtm

clusion of external objects such as images, they httpwwwjurpc ,  have not been implemented consistently. There- 2002 Accessed May 11th, 2005.

fore a number of proposals have been made on   2 T. BERNERS-LEE, Universal Resource Identifiers in how to implement transclusions with the tech- WWW. A Unifying Syntax for the Expression of nologies available today. A few of the most Names and Addresses of Objects on the Network

important approaches have been discussed. as Used in the World-Wide-Web, Request for Com- 

ments 1630 1994 . See also

Addressingrfctxt Based merely on the technologies provided by httpwwwworg .

a web-based environment, we have designed a   3 A. DI IORIO AND F. VITALI, A Xanalogical Collab- system that offers users to author articles that orative Editing Environment, Proceedings of the

may contain transclusions. A first prototype Second International Workshop of Web Document

  utilises plain HTML, Javascript and server-side Analysis WDA2003 , 2003 Edinburgh, UK, Au- gust 2003. See also

components including CGI scripts and a spe-

httpwwwcsclivacukwda

PapersSection cialised HTTP proxy application. IIIPaper .

172 Transclusions in an HTML-Based Environment

     4 L. WOOD ET AL., Document Object Model DOM 18 MICROSOFT CORPORATION, About Cross-

Level 1, Specification Version 1.0, Frame Scripting and Security,

httpwwwworgTRRECDOM

httpmsdnmicrosoftcomworkshop

 Level

scripting securityasp

, 1998 Accessed March authoromxframe , 

29th, 2005. 2005 Accessed May 10th, 2005. 

 ´ 

5 P. L E HEGARET ET AL., W3C Document Ob-  19 A. MOOREETAL., Personally tailored teaching in

 httpwwwworgDOM

ject Model DOM , , WHURLE using conditional translucion, Proceed- 

2005 Accessed March 29th, 2005. ings of the Twelfth ACM Conference on Hypertext 

and , 2001 Aarhus, Danmark, pp. 

 6 ECMA, ECMAScript Language Specification, 3rd 163–164. 

Edition, Standard ECMA-262, 1999 . See also

httpwwwecmainternationalorg 

 20 T.H. NELSON, A File Structure for the Complex, the

lesECMASTEcma publicationsfi Changing and the Indeterminate, Proceedings of the

pdf  ACM 20th National Conference, 1965 Cleveland,

OH, U.S.A., pp. 84–100. 

 7 D. RAGGETT ET AL., HTML 4.01 Specification,

rgTR 

httpwwwwo , 1999   21 T.H. NELSON, Literary Machines, Mindful Press,

Accessed April 28th, 2005. 1981. 

 8 R. FIELDING ET AL., Hypertext Transfer Pro-   22 T.H. NELSON, The Heart of Connection: Hyper-

tocol – HTTP1.1, Request for Comments

media Unified by Transclusion, Communications of

 ftpftpisieduin

2616 1999 . See also 

the ACM,81995 , pp. 31–33. xt

notesrfct . 

 23 T.H. NELSON, Generalized Links, Micropayment

 httpwwwhyperwavecom

 9 Hyperwave, .

enibm

and Transcopyright, httpwwwalmad

comalmadennpuctnelsonhtm 

 10 P.P. KOCH, JavaScript – Get selection, ,



ksmodeorgjs

httpwwwquir 1996 Accessed May 1st, 2003.



selectedhtml, 2004 Accessed May 10th,2005. 

 24 T.H. NELSON, Transcopyright: Pre-Permission for

 

xanadu

11 Transclusions in HTML-Based Environments, Virtual Republishing, httpwwwaus

httpwwwkolbitschorgresearch

 opyhtml comxanadutransc , 1998 Accessed

transclusions. May 3rd, 2005. 

 12 J. KOLBITSCH,H.MAURER, Community Building 

 25 T.H. NELSON, Xanalogical Structure, Needed Now  around Encyclopaedic Knowledge, 2005 to be More than Ever: Parallel Documents, Deep Links

published. to Content, Deep Versioning and Deep Re-Use, 

ACM Computing Surveys,4es1999 .Seealso 

 13 R. KOMAN, How the Wayback Machine Works,

httpxanaducomautedXUsurvey

httpwebservicesxmlcomlptaws

xuDationhtml.

brewsterhtml  , 2002 Accessed

May 10th, 2005. 

 26 NETSCAPE COMMUNICATIONS, JavaScript De-

netscape

veloper Central, httpdeveloper 

 14 H. KROTTMAIER, Transcluded Documents: Advan-

pt  tages of Reusing Document Fragments, Proceed- comtechjavascri , 2004 Accessed Febru-

ings of the 6th International ICCC/IFIP Conference ary 3rd, 2004.

 

on Electronic Publishing ELPUB2002 , 2002  Karlovy Vary, Czech Republic, pp. 359–367.  27 A. PAM, Where World Wide Web Went Wrong,

Proceedings of the AUUG’95 & Asia-Pacific

medudocs See also httphkrottiic

World Wide Web ’95 Conference & Exhibition,

pubpdf

publicationsel .

 http

1995 Sydney, Australia. See also

wwwcsueduauspecialconference 

 15 H. KROTTMAIER,D.HELIC, Issues of Transclu-

apamapamhtml sions, Proceedings of the World Conference on apwwwpapers .

E-Learning in Corporate, Government, Health- 

 28 A. PAM, Fine-Grained Transclusion in the Hyper-

 

care, & Higher Education E-Learn 2002 , 2002 

Montreal, Canada, pp. 1730–1733. See also text Markup Language, Internet Draft 1997 .See

mauarchivedraft

also httpxanaduco

httpcoronetiicmedudenispubs

nstxt pamhtmlfinetra .

elearnbpdf.

    16 H. KROTTMAIER,H.MAURER, Transclusions in 29 E. WILDE,D.LOWE, XML Linking Language. In

the 21st Century, Journal of Universal Com- XPath, XLink, XPointer, and XML: A Practical



puter Science,122001 , pp. 1125–1136. See Guide to Web Hyperlinking and Transclusion E.

 

orgjucs

also httpwwwjucs Wilde and D. Lowe , 2002 pp. 169–198. Addison-

http search

transclusions the st

in . Wesley Professional. See also

webservicestechtargetcom

 

downloadswilde

17 H. MAURER,K.TOCHTERMANN,OnaNewPow- searchWebServices pdf erful Model for Knowledge Management and lowe .

its Applications, Journal of Universal Com-

   

puter Science,12002 , pp. 85–96. See also 30 S. DEROSE ET AL., XML Linking Language XLink

orgjucs httpwwwworgTRxlink

httpwwwjucs Version 1.0, ,

on new powerful  a . 2001 Accessed April 28th, 2005.

Transclusions in an HTML-Based Environment 173

 

 31 W3C Extensible Markup Language XML ,

rgXML  httpwwwwo , 2003 Accessed

April 28th, 2005. 

 32 S. DEROSEETAL., XML Pointer Language

 httpwwwworgTRxptr

XPointer , , 

2002 Accessed May 2nd, 2005. 

 33 M. YOCOM, Mac OS History,

httpwwwmacosutaheduDocumentation

acosxonemacintoshhtml

MacOSXClassesm ,  2004 Accessed May 3rd, 2005.

Received: May, 2005 Accepted: February, 2006

Contact address: Josef Kolbitsch Graz University of Technology Steyrergasse 30 8010 Graz, Austria

e-mail: josefkolbitschtugraza t Hermann Maurer Institute for Information Systems and Computer Media Graz University of Technology Inffeldgasse 16c 8010 Graz, Austria

e-mail: hmaureriicmedu

JOSEF KOLBITSCH is a PhD student at Graz University of Technology. His research interests include electronic encyclopaedias, digital libraries and hypermedia.

HERMANN MAURER is professor and dean of the faculty of computer science at Graz University of Technology. He is author of some twenty books and more than 600 contributions in various publications. Re- cently he has also published “XPERTS”, a series of science fiction novels.