Towards a flexible approach to manage varying and altering information representations

Tobias Haubold, Georg Beier, Wolfram Hardt Zwickau University of Applied Sciences, Informatics, Zwickau, Germany E-mail: [email protected]

ence. Thus materials differ by means of the didactic design, Abstract prior knowledge, expert and non-skilled target audience or fundamental or advanced knowledge. Information and content is described using content or docu- To create such forms of content within a reasonable time ment models. Content production always has a particular pur- copy and paste is usually used to reuse content fragments by pose, target audience and language. Often various forms of rep- duplication. In addition the use of materials in international resentations are created in different data formats and all overlap cooperation demand translation and localization. in content and structure. Fragments of such content have relations On update or revision of content the consistency of par- across different origins. Such relations refer to a finer granularity tially redundant content fragments spread across different than the elementary unit of most management tools. Thus man- agement tools are unable to provide automated support for such document files need to be ensured manually. The process of relations and consistency must be manually ensured during main- locating related content, detecting differences and collect- tenance. ing content for refreshing translation and localization must Using the example of learning and teaching content this pa- be done by hand and leads inevitably to time consuming per outlines key challenges of the problem domain, fundamental maintenance of content. characteristics of content production and management and neces- This paper focuses on requirements of content and doc- sary concepts of content and document models to address these ument models as well as content management approaches. problems. Related work on content and document models as well as management approaches is discussed and an approach for a 2. Key Challenges in Problem Domain repository architecture is presented to target these problems on management level. The process of producing learning and training content starts by defining a main learning path through the knowl- edge domain. According to this path the material is struc- 1. Motivation tured and created using an didactic approach (see [1]). Thus different documents are produced (e.g. presentation slides and scripts) sharing the same structure but contain different Digital learning and teaching materials are usually the representations of the content. During content maintenance result of target oriented content production processes. They the common structure must be ensured manually to keep the are created using appropriate editing tools (Office-Tools like materials consistent. Word, Powerpoint, Apache OpenOffice, LibreOffice, image During the production of materials with a similar target processing applications, special tools to produce SCORM- audience copy and paste is usually used to reuse content by based e-learning content, and more) and managed using the duplication. Thus content fragments are stored redundantly specific data format of the tools. in different document files. During update or revision all Every kind of knowledge transfer pursues an intention duplicated content fragments must be maintained manually and follows a didactic approach. Thus teaching in pres- to ensure consistency. The same applies to derived con- ence (presentation slides), self-study (lecture notes) and in- tent. Content fragments are duplicated and subsequently teractive online learning (web content) demand all different changed to better fit other purposes or author styles. Dur- kinds of content representations. The purpose of materials ing change of the comprised knowledge all derived content is significantly characterized by the learner as target audi- must be considered for manual maintenance. The work was funded by the European Union and the Free State of Multilingual materials with a high rate of change need Saxony. frequent refreshing translations and localizations. When the

566 latest updates are not yet available, one must deal with in- 4. Solution Approaches in Content Production completely translated and localized materials. Approaches to address these challenges involve docu- To address the key challenges content or document mod- ment and content models as well as content management els need the following mechanisms. approaches. 4.1. Content References 3. Fundamental Characteristics of Content The document model must provide support for content Production and Management references. To use them they must be allowed on the desired position and the type of content must be supported. 3.1. Reuse of Content There are two kinds of content references: references to files and references to file fragments. In the latter case just By using copy and paste content can be reused by dupli- the referenced fragment is included instead of the whole cation. This results in redundant storage and thus content file. This requires mechanisms to address content fragments updates have to be done manually on each copy to ensure defined by the referenced document model. consistency. Content references allow to reuse content without dupli- 4.2. Content Aggregations cation and redundant storage. Instead of the content itself just a reference to it is stored1. Whereas copy and paste is Mechanisms to aggregate content are based on content a feature of the editing tool, content references are a feature references and allow to compose documents out of other of the document model and must be supported by it. documents and document fragments. In addition to con- tent inclusion the content is adapted to the current context if required, i.e. it is included on the desired position in the hi- 3.2. Granularity in Content Production erarchical structure of the document and the document wide layout and formatting settings are applied. Layout and for- In this context granularity is understood as the extend matting settings particularly defined for the included con- of content. Document models define concepts to structure tent are not adjusted. content (e.g. headings for hierarchical structures in text doc- uments or slides in presentation software) and capture con- 4.3. Content Filtering tent (e.g. text in text boxes or paragraphs). Each concept represents a level of granularity. Thus there are levels with Mechanisms for content filtering allow for conditional finer and coarser granularity. content inclusion. Some kind of configuration settings spec- Especially in document oriented content production pro- ify whether or not particular content fragments are included cesses resulting document files are tailored to particular pur- or not. Usually such mechanisms evaluate meta data at- poses and have a coarse granularity. Within learning object tached to content fragments. based content production processes Hamel et al. states that the granularity should be relatively small [2]. In addition 4.4. Separation of Content and Presentation Duval et al. states that learning objects with finer granular- ity are more easily reusable [3]. There are two common approaches to separate content and presentation. First, the content contains meta data 3.3. Granularity in Content Management referencing presentation details. A presentation engines resolves these references and renders the content appro- All management tasks including versioning, categoriza- priately. Examples are the HyperText tion, meta data management as well as translation are based (HTML) which references Cascading Style Sheets (CSS) on an elementary content management unit. Document files or Open Document (ODF) which defines format tem- Management Systems (DMS) and versioning tools use sim- plates referenced by content. ple files as management unit. Due to the coarse granularity Second, there are transformation based approaches. The of document files such tools are not suited to address the content is either described by an XML based document key challenges of section 2. Content Management Systems model or an implemented content model. Transforma- (CMS) use the concepts of their content model as manage- tions are described using the Extensible Stylesheet Lan- ment unit. Thus they provide finer levels of granularity but guage Transformations (XSLT) or an appropriate template prescribe particular content models. language. An XSLT or template processor takes the con- tent and transformations as input and produces output doc- 1links in contrast serve as navigation uments. Examples are DocBook, DITA or CMSs.

567 5. Proposed Content Management Approach 5.3. Data Storage Layer

To provide automated tooling support for the key chal- The main purpose of this layer is to persist data. All data lenges mention in section 2, content management must de- is stored as binary stream. A separate data storage layer is tect and manage content references and content aggrega- suited to support deduplication and distribution of data to tions and must support meta data for content filtering on a enable the use as distributed repository. fine level of granularity. The proposed approach focuses on a repository architecture to manage fine granular content 5.4. Repository Architecture fragments and to capture content references and aggrega- tions in a document and content model independent manner. An adapter mechanism is used to map content into the proposed management model. File based adapters build 5.1. Content Structure Layer appropriate content structures based on content or docu- ment model concepts as well as content meta data. A The structural aspects of content are captured by this plug-in mechanism supports extensible file based mappings. layer, illustrated in figure 1. Opaque content represents any Adapters may be integrated into content production envi- Asset ronments like Office-Suites as well. The adapter concept allows for type as well as instance based mappings. Thus individual content fragments of the same content type may be mapped differently. Furthermore

adapters may overcome the limitations of document models by providing extensions to support content references and aggregation mechanisms.

Figure 1: Content Structure Layer UML Sketch 6. Related Work unstructured content and can be compared to the manage- ment unit of other management approaches. Content nodes LATEX supports content references to files and separates are used to represent structured content in form of a tree content and presentation. It does not support content aggre- by having any number of children nodes. Content can be gations (because explicit outline level for headlines), con- attached directly to content nodes or by referencing other tent filtering and references to file fragments. assets. All assets are identifiable by an URI (Uniform Re- DocBook 5[11] supports content references to files, con- source Identifier), may have any number of arbitrarily meta tent filtering and separates content and presentation. To- data attached and store content binary. gether with XInclude[12] and XPath[13] content references to document fragments are supported as well as limited con- 5.2. Semantic Layer tent aggregations2. In future DocBook 5.1[14] will support content aggregations. All assets are resources in the semantic sense. This layer DITA[15] supports content references to files and docu- is used to classify content and to capture all kinds of con- ment fragments, content filtering, separation of content and tent relationships. It resembles a large graph with nodes presentation as well as content aggregations directly. being content assets or concepts defined by semantic tech- Document models of office suites support content ref- nologies. The edges constitute relationships between con- erences basically to images, multimedia and OLE objects tent assets, relationships to semantic meta data as well as (Object Linking and Embedding). Content aggregations are relationships between semantic concepts. not supported. OLE is used to include document fragments This layer enables the use of existing meta data stan- of other OLE supported applications in a display oriented dards like Dublin Core [4] or LOM (Learning Object manner. If no appropriate application is available, included Metadata)[5] as well as their extension or supplement document fragments are usually replaced by images. Open with custom concepts, categories, terminologies or tax- Document[16] text documents provide the concept of sec- onomies defined using SKOS (Simple Knowledge Orga- tions for modular editing. It allows to include fragments of nization System)[6], RDFS (Resource Description Frame- other documents by reference and replaces the document work Schema)[7] or OWL (Web Ontology Language)[8]. wide style settings. Because outline levels are explicitly Having all content assets and semantic meta data acces- stated it cannot be considered as content aggregation mech- sible in one huge graph allows for simple querying using anism. SPARQL[9] or gremlin[10] as well as easy exploration for 2the element section can be nested and specifies the outline level similar content. relatively

568 Only DITA provides mechanisms for all solution ap- [3] Erik Duval and Wayne Hodgins. A LOM research agenda. In Pro- proaches mentioned in section 4. All other document mod- ceedings of the twelfth international conference on World Wide Web, els are not suited by default to target all key challenges of pages 1–9. ACM Press, 2003. section 2. In addition content management approaches lack- [4] DCMI Metadata Terms. DCMI Recommendation, Dublin Core Metadata Initiative (DCMI), June 2012. ing support too. [5] IEEE Learning Technology Standards Committee. IEEE Standard According [17] there are different kinds of CMSs but for Learning Object Metadata. Approved Publication of IEEE, Insti- there is no clear distinction between them. In accordance tute of Electrical and Electronics Engineers (IEEE), 2002. with [18] related kinds of CMS are Web CMS, Learning [6] Sean Bechhofer and Alistair Miles. SKOS Simple Knowl- CMS, and Document Management Systems (DMS). Web edge Organization System Reference. W3C Recommenda- tion, World Wide Web Consortium (W3C), August 2009. CMS as well as Learning CMS are limited to web technol- http://www.w3.org/TR/2009/REC-skos-reference-20090818/. ogy based publishing[18]. They might be suited to manage [7] Ramanathan Guha and Dan Brickley. RDF Schema 1.1. W3C Rec- content for interactive online learning but not for materials ommendation, World Wide Web Consortium (W3C), January 2014. used for teaching in presence like presentation slides or lec- http://www.w3.org/TR/2014/PER-rdf-schema-20140109/. ture notes. The proposed approach is suited for any kind of [8] W3C OWL Working Group. OWL 2 Web Ontology Lan- content model and don’t have such limitations. guage Document Overview (Second Edition). W3C Rec- ommendation, World Wide Web Consortium (W3C), 2012. DMSs are designed to manage document files. They do http://www.w3.org/TR/2012/REC-owl2-overview-20121211/. not consider their internal structure. Thus they are not capa- [9] Andy Seaborne and Steven Harris. SPARQL 1.1 Query Lan- ble of detecting and managing content references and con- guage. W3C Recommendation, World Wide Web Consortium tent aggregations. Alfesco[19] as example provides support (W3C), March 2013. http://www.w3.org/TR/2013/REC-sparql11- query-20130321/. to manage translations on document level. Thus it supports [10] Gremlin Graph Traversal Language. the third key challenge with file-based granularity. The pro- https://github.com/tinkerpop/gremlin/wiki. posed approach is able to manage content with relations on [11] The DocBook Schema Version 5.0. OASIS Standard, Organiza- any level of granularity. The management software edu- tion for the Advancement of Structured Information Standards (OA- sharing[20] is a distributed repository for teaching content. SIS), November 2009. http://docbook.org/specs/docbook-5.0-spec- It is based on Alfresco and therefore bound to the same con- os.html. tent management approach [21]. [12] Jonathan Marsh, David Orchard, and Daniel Veillard. XML In- clusions (XInclude) Version 1.0 (Second Edition). W3C Recom- mendation, World Wide Web Consortium (W3C), November 2006. 7. Summary http://www.w3.org/TR/2006/REC-xinclude-20061115/. [13] James Clark and Steven DeRose. XML Path Language (XPath) Ver- sion 1.0. W3C Recommendation, World Wide Web Consortium Document oriented production of learning and teaching (W3C), November 1999. http://www.w3.org/TR/1999/REC-xpath- 19991116. materials is still prevalent in institutions of higher educa- [14] DocBook 5.1 Candidate Release 2, December 2013. tion like universities. With ”documents must die” criticized http://docbook.org/xml/5.1CR2/. [22] such approaches almost 10 years ago and states that ef- [15] Darwin Information Typing Architecture (DITA) Version 1.2. OA- forts for share and reuse based on simple file oriented doc- SIS Standard, Organization for the Advancement of Structured In- ument models are bound to fail. He argued that we need formation Standards (OASIS), December 2010. http://docs.oasis- more sophisticated content and document models. Though open.org/dita/v1.2/spec/DITA1.2-spec.html. many content and document models evolve to open stan- [16] Open Document Format for Office Applications (OpenDocument) Version 1.2. OASIS Standard, Organization for the Advance- dards, they still have limitations and need to be supported ment of Structured Information Standards (OASIS), September 2011. by a flexible yet scalable content management approach. http://docs.oasis-open.org/office/v1.2/OpenDocument-v1.2.html. Further research includes the mapping of content and [17] Wolfgang Maass. Semantic Technologies in Content Management document models to the management model including the Systems: Trends, Applications and Evaluations, chapter 1, page 4. Springer, 2012. analysis of consequences, its eligibly and effort as well [18] Ann Rockley and Charles Cooper. Managing Enterprise Content: A as feasibility to extend document models with aggregation Unified Content Strategy. Voices That Matter. Pearson Education, 2 mechanisms. edition, 2012. [19] Alfresco homepage, 2014. http://www.alfresco.com/. References [20] edu-sharing homepage. http://edu-sharing.net/, 2014. [21] Michael Klebl, J. Bernd Kramer,¨ Annett Zobel, Matthias Hupfer, and Christian Lukaschik. Distributed Repositories for Educational Con- [1] Michael Kerres. Mediendidaktik: Konzeption und Entwicklung medi- tent. eleed, 7(1), 2010. engestutzter¨ Lernangebote. Oldenbourg Wissenschaftsverlag, 2012. [22] Erik Duval. We’re on the road to ... In Lorenzo Cantoni and Cather- ine McLoughlin, editors, Proceedings of the ED-MEDIA 2004 World [2] Cheryl J. Hamel and David Ryan-Jones. Designing instruction with Conference on Educational Multimedia, Hypermedia and Telecom- learning objects. International Journal of Educational Technology munications, pages 3–5, Lugano, Switzerland, 2004. AACE. (IJET), 3(1), November 2002.

569