Towards a Flexible Approach to Manage Varying and Altering

Total Page:16

File Type:pdf, Size:1020Kb

Towards a Flexible Approach to Manage Varying and Altering Towards a flexible approach to manage varying and altering information representations Tobias Haubold, Georg Beier, Wolfram Hardt Zwickau University of Applied Sciences, Informatics, Zwickau, Germany E-mail: [email protected] ence. Thus materials differ by means of the didactic design, Abstract prior knowledge, expert and non-skilled target audience or fundamental or advanced knowledge. Information and content is described using content or docu- To create such forms of content within a reasonable time ment models. Content production always has a particular pur- copy and paste is usually used to reuse content fragments by pose, target audience and language. Often various forms of rep- duplication. In addition the use of materials in international resentations are created in different data formats and all overlap cooperation demand translation and localization. in content and structure. Fragments of such content have relations On update or revision of content the consistency of par- across different origins. Such relations refer to a finer granularity tially redundant content fragments spread across different than the elementary unit of most management tools. Thus man- agement tools are unable to provide automated support for such document files need to be ensured manually. The process of relations and consistency must be manually ensured during main- locating related content, detecting differences and collect- tenance. ing content for refreshing translation and localization must Using the example of learning and teaching content this pa- be done by hand and leads inevitably to time consuming per outlines key challenges of the problem domain, fundamental maintenance of content. characteristics of content production and management and neces- This paper focuses on requirements of content and doc- sary concepts of content and document models to address these ument models as well as content management approaches. problems. Related work on content and document models as well as management approaches is discussed and an approach for a 2. Key Challenges in Problem Domain repository architecture is presented to target these problems on management level. The process of producing learning and training content starts by defining a main learning path through the knowl- edge domain. According to this path the material is struc- 1. Motivation tured and created using an didactic approach (see [1]). Thus different documents are produced (e.g. presentation slides and scripts) sharing the same structure but contain different Digital learning and teaching materials are usually the representations of the content. During content maintenance result of target oriented content production processes. They the common structure must be ensured manually to keep the are created using appropriate editing tools (Office-Tools like materials consistent. Word, Powerpoint, Apache OpenOffice, LibreOffice, image During the production of materials with a similar target processing applications, special tools to produce SCORM- audience copy and paste is usually used to reuse content by based e-learning content, and more) and managed using the duplication. Thus content fragments are stored redundantly specific data format of the tools. in different document files. During update or revision all Every kind of knowledge transfer pursues an intention duplicated content fragments must be maintained manually and follows a didactic approach. Thus teaching in pres- to ensure consistency. The same applies to derived con- ence (presentation slides), self-study (lecture notes) and in- tent. Content fragments are duplicated and subsequently teractive online learning (web content) demand all different changed to better fit other purposes or author styles. Dur- kinds of content representations. The purpose of materials ing change of the comprised knowledge all derived content is significantly characterized by the learner as target audi- must be considered for manual maintenance. The work was funded by the European Union and the Free State of Multilingual materials with a high rate of change need Saxony. frequent refreshing translations and localizations. When the 566 latest updates are not yet available, one must deal with in- 4. Solution Approaches in Content Production completely translated and localized materials. Approaches to address these challenges involve docu- To address the key challenges content or document mod- ment and content models as well as content management els need the following mechanisms. approaches. 4.1. Content References 3. Fundamental Characteristics of Content The document model must provide support for content Production and Management references. To use them they must be allowed on the desired position and the type of content must be supported. 3.1. Reuse of Content There are two kinds of content references: references to files and references to file fragments. In the latter case just By using copy and paste content can be reused by dupli- the referenced fragment is included instead of the whole cation. This results in redundant storage and thus content file. This requires mechanisms to address content fragments updates have to be done manually on each copy to ensure defined by the referenced document model. consistency. Content references allow to reuse content without dupli- 4.2. Content Aggregations cation and redundant storage. Instead of the content itself just a reference to it is stored1. Whereas copy and paste is Mechanisms to aggregate content are based on content a feature of the editing tool, content references are a feature references and allow to compose documents out of other of the document model and must be supported by it. documents and document fragments. In addition to con- tent inclusion the content is adapted to the current context if required, i.e. it is included on the desired position in the hi- 3.2. Granularity in Content Production erarchical structure of the document and the document wide layout and formatting settings are applied. Layout and for- In this context granularity is understood as the extend matting settings particularly defined for the included con- of content. Document models define concepts to structure tent are not adjusted. content (e.g. headings for hierarchical structures in text doc- uments or slides in presentation software) and capture con- 4.3. Content Filtering tent (e.g. text in text boxes or paragraphs). Each concept represents a level of granularity. Thus there are levels with Mechanisms for content filtering allow for conditional finer and coarser granularity. content inclusion. Some kind of configuration settings spec- Especially in document oriented content production pro- ify whether or not particular content fragments are included cesses resulting document files are tailored to particular pur- or not. Usually such mechanisms evaluate meta data at- poses and have a coarse granularity. Within learning object tached to content fragments. based content production processes Hamel et al. states that the granularity should be relatively small [2]. In addition 4.4. Separation of Content and Presentation Duval et al. states that learning objects with finer granular- ity are more easily reusable [3]. There are two common approaches to separate content and presentation. First, the content contains meta data 3.3. Granularity in Content Management referencing presentation details. A presentation engines resolves these references and renders the content appro- All management tasks including versioning, categoriza- priately. Examples are the HyperText Markup Language tion, meta data management as well as translation are based (HTML) which references Cascading Style Sheets (CSS) on an elementary content management unit. Document files or Open Document (ODF) which defines format tem- Management Systems (DMS) and versioning tools use sim- plates referenced by content. ple files as management unit. Due to the coarse granularity Second, there are transformation based approaches. The of document files such tools are not suited to address the content is either described by an XML based document key challenges of section 2. Content Management Systems model or an implemented content model. Transforma- (CMS) use the concepts of their content model as manage- tions are described using the Extensible Stylesheet Lan- ment unit. Thus they provide finer levels of granularity but guage Transformations (XSLT) or an appropriate template prescribe particular content models. language. An XSLT or template processor takes the con- tent and transformations as input and produces output doc- 1links in contrast serve as navigation uments. Examples are DocBook, DITA or CMSs. 567 5. Proposed Content Management Approach 5.3. Data Storage Layer To provide automated tooling support for the key chal- The main purpose of this layer is to persist data. All data lenges mention in section 2, content management must de- is stored as binary stream. A separate data storage layer is tect and manage content references and content aggrega- suited to support deduplication and distribution of data to tions and must support meta data for content filtering on a enable the use as distributed repository. fine level of granularity. The proposed approach focuses on a repository architecture to manage fine granular content 5.4. Repository Architecture fragments and to capture content references and aggrega- tions in a document and content model independent manner. An adapter mechanism is used to map content into the proposed management model. File based adapters build 5.1.
Recommended publications
  • THE TEXT ENCODING INITIATIVE Edward
    THE TEXT ENCODING INITIATIVE Edward Vanhoutte & Ron Van den Branden Centre for Scholarly Editing and Document Studies Royal Academy of Dutch Language and Literature - Belgium Koningstraat 18 9000 Gent Belgium [email protected] [email protected] KEYWORDS: text-encoding, markup, markup languages, XML, humanities Preprint © Edward Vanhoutte & Ron Van den Branden - The Text Encoding Initiative Bates & Maack (eds.), Encyclopedia of Library and Information Sciences - 1 ABSTRACT The result of community efforts among computing humanists, the Text Encoding Initiative or TEI is the de facto standard for the encoding of texts in the humanities. This article explains the historical context of the TEI, its ground principles, history, and organisation. Preprint © Edward Vanhoutte & Ron Van den Branden - The Text Encoding Initiative Bates & Maack (eds.), Encyclopedia of Library and Information Sciences - 2 INTRODUCTION The Text Encoding Initiative (TEI) is a standard for the representation of textual material in digital form through the means of text encoding. This standard is the collaborative product of a community of scholars, chiefly from the humanities, social sciences, and linguistics who are organized in the TEI Consortium (TEI-C <http://www.tei-c.org>). The TEI Consortium is a non-profit membership organisation and governs a wide variety of activities such as the development, publication, and maintenance of the text encoding standard documented in the TEI Guidelines, the discussion and development of the standard on the TEI mailing list (TEI-L) and in Special Interest Groups (SIG), the gathering of the TEI community on yearly members meetings, and the promotion of the standard in publications, on workshops, training courses, colloquia, and conferences.
    [Show full text]
  • Towards a Content-Based Billing Model: the Synergy Between Access Control and Billing
    TOWARDS A CONTENT-BASED BILLING MODEL: THE SYNERGY BETWEEN ACCESS CONTROL AND BILLING Peter J. de Villiers and Reinhardt A. Botha Faculty of Computer Studies, Port Elizabeth Technikon, Port Elizabeth [email protected], [email protected] Abstract In the internet environment the importance of content has grown. This has brought about a change in emphasis to provide content-based ser- vices in addition to traditional Internet transactions and services. The introduction of content-based services is the key to increase revenues. To e®ectively increase revenues through content-based services e±cient billing systems need to be implemented. This paper will investigate a presumed synergy between access control and billing. The investigation identi¯es three phases. Subsequently the similarities and di®erences be- tween access control and billing in each of these phases are investigated. This investigation assumes that information delivery takes place in an XML format. Keywords: access control, billing, XML 1. INTRODUCTION The global economy has made a transition from an economy with an industrial focus to being an economy based on knowledge and infor- mation. This new paradigm is enabled by the use of Information and Communication Technologies (ICT). The increasing pace of technological innovations in the ¯eld of ICT has given rise to new ways of communicating, learning and conducting business. The Internet has facilitated the establishment of a "borderless" environment for communications and the electronic delivery of certain services. This is known as electronic business (e-business). Electronic business is the key to doing business in the new global economy that is based on knowledge and information [15].
    [Show full text]
  • 3.4 the Extensible Markup Language (XML)
    TEI BY EXAMPLE MODULE 0: INTRODUCTION TO TEXT ENCODING AND THE TEI Edward Vanhoutte Ron Van den Branden Melissa Terras Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium, Gent, 9 July 2010 Last updated September 2020 Licensed under a Creative Commons Attribution ShareAlike 3.0 License Module 0: Introduction to Text Encoding and the TEI TABLE OF CONTENTS 1. Introduction....................................................................................................................................................................1 2. Text Encoding in the Humanities...............................................................................................................................2 3. Markup Languages in the Humanities.......................................................................................................................3 3.1 Procedural and Descriptive Markup.................................................................................................................... 3 3.2 Early Attempts......................................................................................................................................................... 3 3.3 The Standard Generalized Markup Language (SGML)...................................................................................... 4 3.4 The eXtensible Markup Language (XML)............................................................................................................5 4. XML: Ground Rules........................................................................................................................................................5
    [Show full text]
  • Never the Same Stream: Netomat, Xlink, and Metaphors of Web Documents Colin Post Patrick Golden Ryan Shaw
    Never the Same Stream: netomat, XLink, and Metaphors of Web Documents Colin Post Patrick Golden Ryan Shaw University of North Carolina at University of North Carolina at University of North Carolina at Chapel Hill Chapel Hill Chapel Hill School of Information and School of Information and School of Information and Library Science Library Science Library Science [email protected] [email protected] [email protected] ABSTRACT value for designers and engineers of studying “paths not taken” during the history of the technologies we work on today. Document engineering employs practices of modeling and representation that ultimately rely on metaphors. Choices KEYWORDS among possible metaphors, however, often receive less Hypertext; XLink; browsers; XML; digital art attention than choices driven by practical requirements such as performance or usability. Contemporary debates over how to meet practical requirements always presume some shared 1 Introduction metaphors, making them hard to discern from the surrounding context in which the engineering tasks take place. One way to The metaphor of the printed page has been a prominent address this issue is by taking a historical perspective. In this and indeed from the earliest implementations of hypertext. paper, we attend to metaphorical choices by comparing two structuring principle of Web documentsd hypertext since in the contrast �irst browser, to the historical case studies of “failed” designs for hypertext on the printed page, as whatever “could not conveniently be netomat (1999), a Web browser created preAlthoughsented Nelsonor represented �irst de�ine on paper” [29, page 96], many by the artist Maciej Wisniewski, which responded to search subsequent hypertext systems treated documents as queriesWeb.
    [Show full text]
  • Streaming-Based Processing of Secured XML Documents
    Streaming-based Processing of Secured XML Documents Chair for Network- and Data Security Horst Görtz Institute for IT Security Ruhr-University Bochum Juraj Somorovský Matriculation Number: 108 006 245 025 2.10.2009 Supervisors: Prof. Jörg Schwenk Dipl.-Inf. Meiko Jensen Eidesstattliche Erklärung Ich versichere hiermit, dass ich meine Masterarbeit mit dem Thema Streaming-based Processing of Secured XML Documents selbständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt habe. Die Arbeit wurde bisher keiner anderen Prüfungsbehörde vorgelegt und auch nicht veröentlicht. Bochum, den 2. Oktober 2009 Abstract WS-Security is a standard providing message-level security in Web Services. It allows exible application of security mechanisms in SOAP messages. Therewith it ensures their integrity, condentiality and authenticity. However, using sophisticated security algorithms can lead to high memory consumptions and long evaluation times. In the combination with the standard XML DOM processing approach the Web Services servers become a simple target of Denial-of-Service attacks. This thesis presents a solution for these problems an external streaming-based WS-Security Gateway. First the diculties and limitations of the streaming-based approach applied on secured XML documents are discussed. Then the implementation of the WS-Security Gateway is presented. The implementation is capable of processing XML signatures in SOAP messages. The evaluation shows that streaming-based approach enhances the performance and is much more ecient in comparison to standard DOM-based frameworks. It is also proved that the implementation is under certain circumstances resistant against signature wrapping attacks. Contents 1 Introduction1 2 Related work3 2.1 XML Signature .
    [Show full text]
  • Standoff Markup Peter L
    Enhancing composite Digital Documents Using XML-based Standoff Markup Peter L. Thomas David F. Brailsford Document Engineering Laboratory Document Engineering Laboratory School of Computer Science School of Computer Science University of Nottingham University of Nottingham Nottingham, NG8 1BB, UK Nottingham, NG8 1BB, UK [email protected] [email protected] ABSTRACT 1. INTRODUCTION Document representations can rapidly become unwieldy if they In two previous papers [1, 2] we have set out techniques for try to encapsulate all possible document properties, ranging using XML templates as a guide for embedding customised from abstract structure to detailed rendering and layout. structure into PDF files. A Structured PDF file usually marks up its PDF content with the Adobe Standard Structured Tagset We present a composite document approach wherein an XML- (SST). The SST Tags (see [3]) are roughly equivalent to those based document representation is linked via a ‘shadow tree’ of in HTML in that they denote document components such as bi-directional pointers to a PDF representation of the same paragraphs, titles, headings etc. Unlike XML, it is not possible document. Using a two-window viewer any material selected to directly intermix SST tags and PDF content. Instead, the in the PDF can be related back to the corresponding material in tags comprise the nodes of a separate structure tree and each of the XML, and vice versa. In this way the treatment of specialist these nodes contains a pointer to a linked list of marked material such as mathematics, music or chemistry (e.g. via ‘read content within the PDF itself.
    [Show full text]
  • Enhancing the Security of Applications Using XML-Based Technologies
    Enhancing the Security of Applications using XML-based Technologies Syed I Ahmed Signal Stream Inc. Signal Stream Inc. 580 Wilkie Drive Orleans, Ontario. K4A 1N3 CANADA Project Manager: S. Dahel 613-993-9949 Contract Number: W7714-3-08506 Contract Scientific Authority: S. Dahel 613-993-9949 DEFENCE R&D CANADA - OTTAWA Contractor Report DRDC Ottawa CR 2003-201 December 2003 The scientific or technical validity of this Contractor Report is entirely the responsibility of the contractor and the contents do not necessarily have the approval or endorsement of Defence R&D Canada. Abstract XML and associated core family of XML based W3C Recommendations have no built-in information security features but have rich characteristics that can be ex- ploited to devise numerous security schemes. The evolving XML Security Stan- dards and COTS tools that can enhance the security of applications are identified. DRDC Ottawa CR 2003-201 i This page intentionally left blank. ii DRDC Ottawa CR 2003-201 TABLE OF CONTENTS 1. EXECUTIVE SUMMARY ................................................................................................................. 1 2. REPORT ON PHASE- I WORK ....................................................................................................... 3 2.1 Research Approach....................................................................................................................... 3 2.2 The concept of information security.............................................................................................. 3 2.3
    [Show full text]
  • Twenty Years of Theological Markup Languages: a Retro- and Prospective by Race J
    40 THEOLOGICAL LIBRARIANSHIP • VOL. 12, NO. 1 APRIL 2019 Twenty Years of Theological Markup Languages: A Retro- and Prospective by Race J. MoChridhe ABSTRACT: ThML—the first open, XML-based markup language designed specifically for digital libraries handling theological collections—was conceived in 1998, sparking a period of development in discipline-specific markup languages for theology that lasted until the early 2000s, when the dominance of the TEI standard led the field to stagnate. Despite the disappearance of the active developer communities behind most of the projects and technical improvements in TEI, however, ThML and other languages developed during that period remain in use. After present- ing a brief history of theology-specific markup, this article seeks to understand what its persistence tells us about the discipline-specific needs of biblical and theological studies that are still not being met by TEI, and offers insights as to the lessons that may be drawn from these projects for the future of theological markup. In 1998, Harry Plantinga, a computer science professor at Calvin College, began work on the Theological Markup Language (ThML)—the first open, XML-based markup language designed specifically for digital libraries handling theological collections.1 Over the past two decades, ThML has provided critical technical infrastructure to a wide range of projects, both academic and popular, that have disseminated theological literature to millions of users. In 1999, a consortium of leading organizations in biblical studies and Bible publishing started development on projects that became the Open Scripture Information Standard (OSIS)— another XML-based language intended for specialized theological use.
    [Show full text]
  • XML Base W3C Proposed Recommendation 20 December 2000
    XML Base W3C Proposed Recommendation 20 December 2000 This version: http://www.w3.org/TR/2000/PR-xmlbase-20001220 (available in: HTML, XML) Latest version: http://www.w3.org/TR/xmlbase Previous version: http://www.w3.org/TR/2000/CR-xmlbase-20000908 Editor: Jonathan Marsh (Microsoft) <[email protected]> Copyright ©2000 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Abstract This document proposes a facility, similar to that of HTML BASE, for defining base URIs for parts of XML documents. Status of this document On 20 December 2000, this document enters a Proposed Recommendation review period. From that date until 31 January 2001, W3C Advisory Committee representatives are encouraged to review this specification and return comments in their completed review to w3c-xlink- [email protected]. Comments sent to this list will be made visible to Members after the review. Please send any comments of a confidential nature in separate email to [email protected], which is visible to the Team only. After the review, the Director will announce the document's disposition: it may become a W3C Recommendation (possibly with minor changes), it may revert to Working Draft status, or it may be dropped as a W3C work item. This announcement should not be expected sooner than 14 days after the end of the review. Publication as a Proposed Recommendation does not imply endorsement by the W3C membership. This is still a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Proposed Recommendations as other than "work in progress.
    [Show full text]
  • The Text Encoding Initiative: Flexible and Extensible Document Encoding
    The Text Enco ding Initiative: Flexible and Extensible Do cument Enco ding David T. Barnard and Nancy M. Ide Technical Rep ort 96-396 ISSN 0836-0227-96-396 Department of Computing and Information Science Queen's University Kingston, Ontario, Canada K7L 3N6 Decemb er 1995 Abstract The Text Enco ding Initiativeisaninternational collab oration aimed at pro ducing a common enco ding scheme for complex texts. The diversity of the texts used by memb ers of the communities served by the pro ject led to a large sp eci cation, but the sp eci cation is structured to facilitate understanding and use. The requirement for generality is sometimes in tension with the requirement to handle sp ecialized text typ es. The texts that are enco ded often can b e viewed or interpreted in several di erentways. While many electronic do cuments can b e enco ded in very simple ways, 1 some do cuments and some users will tax the limits of any xed scheme, so a exible extensible enco ding is required to supp ort research and to facilitate the reuse of texts. 1 Intro duction Computerized pro cessing of structured do cuments is an imp ortant theme in information science [1, 13 ]. To take full advantage of progress in this eld it is necessary to have convenient electronic representations of structured do cuments. The publication of Guidelines for Electronic Text Encoding and Interchange [14 ] by the Text Enco ding Initiative TEI [16] was the result of several years of collab orative e ort by memb ers of the humanities and linguistics research and academic community in Europ e, North America and Asia to derive a common enco ding scheme for complex textual structures [11].
    [Show full text]
  • Markup Overlap: a Review and a Horse
    Extreme Markup Languages 2004® Montréal, Québec August 2-6, 2004 Markup Overlap: A Review and a Horse Steven DeRose Bible Technologies Group Abstract "Overlap" is the common term for cases where some markup structures do not nest neatly into others, such as when a quotation starts in the middle of one paragraph and ends in the middle of the next. OSIS [Duru03], a standard XML schema for Biblical and related materials, has to deal with extreme amounts of overlap. The simplest case involves book/chapter/verse and book/story/ paragraph hierarchies that pervasively diverge; but many types of overlap are more complicated than this. The basic options for dealing with overlap in the context of SGML [ISO 8879] or XML [Bray98] are described in the TEI Guidelines [TEI]. I summarize these with their strengths and weaknesses. Previous proposals for expressing overlap, or at least kinds of overlap, don't work well enough for the severe and frequent cases found in OSIS. Thus, I present a variation on TEI milestone markup that has several advantages. This is now the normative way of encoding non-hierarchical structures in OSIS documents. Markup Overlap: A Review and a Horse Table of Contents Introduction.......................................................................................................................................1 The problem types.............................................................................................................................1 Evaluation criteria.............................................................................................................................2
    [Show full text]
  • Evaluation of Xpath Queries Against XML Streams Dan Olteanu
    Evaluation of XPath Queries against XML Streams Dan Olteanu Dissertation zur Erlangung des akademischen Grades des Doktors der Naturwissenschaften an der Fakult¨at fur¨ Mathematik, Informatik und Statistik der Ludwig{Maximilians{Universit¨at Munc¨ hen vorgelegt von Dan Olteanu Munc¨ hen, Dezember 2004 Erstgutachter: Fran¸cois Bry Zweitgutachter: Dan Suciu (University of Washington) Tag der mundlic¨ hen Prufung:¨ 11. Februar 2005 To my wife Flori iv v Abstract XML is nowadays the de facto standard for electronic data interchange on the Web. Available XML data ranges from small Web pages to ever-growing repositories of, e.g., biological and astronomical data, and even to rapidly changing and possibly unbounded streams, as used in Web data integration and publish-subscribe systems. Animated by the ubiquity of XML data, the basic task of XML querying is becoming of great theoretical and practical importance. The last years witnessed efforts as well from practitioners, as also from theoreticians towards defining an appropriate XML query language. At the core of this common effort has been identified a navigational approach for information localization in XML data, comprised in a practical and simple query language called XPath [46]. This work brings together the two aforementioned \worlds", i.e., the XPath query eval- uation and the XML data streams, and shows as well theoretical as also practical relevance of this fusion. Its relevance can not be subsumed by traditional database management systems, because the latter are not designed for rapid and continuous loading of individual data items, and do not directly support the continuous queries that are typical for stream applications [17].
    [Show full text]