Facilitating Data-Flows at a Global Publisher Using the Linked Data Stack

Semantic Web 1 (2016) 1–5 1 IOS Press Facilitating Data-Flows at a Global Publisher using the Linked Data Stack Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name Surname, University, Country Christian Dirschl a, Katja Eck a, Jens Lehmann b;∗, Lorenz Bühmann b, Sören Auer c, Bert Van Nuffelen d a Wolters Kluwer Deutschland GmbH, 85716 Unterschleissheim, Germany E-mail: [email protected], [email protected] b University of Leipzig, Institute of Computer Science, AKSW Group, Augustusplatz 10, D-04009 Leipzig, Germany E-mail: {lastname}@informatik.uni-leipzig.de c University of Bonn, Computer Science, Enterprise Information Systems & Fraunhofer IAIS, Bonn, Germany E-mail: [email protected] d TenForce, Havenkant 38, 3000 Leuven, Belgium E-mail: [email protected] Abstract. The publishing industry is at the verge of an era, wherein particular professional customers of publishing products are not so much interested in comprehensive books and journals, i.e. traditional publishing products, anymore as they now are interested in possibly structured information pieces delivered just-in-time as a certain information need arises. This requires a transformation of the publishing workflows towards the production of much richer meta-data for fine-grained and highly interlinked pieces of content. Linked Data can play a crucial role in this transition. The Linked Data Stack is an integrated distribution of aligned tools which support the whole lifecycle of Linked Data from extraction, authoring/creation via enrichment, interlinking, fusing to maintenance. In this application paper, we describe a real-world usage scenario of the Linked Data Stack at a global publishing company. We give an overview over the Linked Data Stack and the underlying life-cycle of Linked Data, describe data-flows and usage scenarios at a publisher and then show how the stack supports those scenarios. Keywords: Publishing, Linked Open Data, Linked Data Stack 1. Introduction information, delivery of just-in-time, context-specific and personalized information) and therefore the inter- In times of tablets, smartphones and a growing num- nal workflow processes especially for those publish- ber of other electronic devices, publishers are more and ers targeting professional audiences (e.g. legal, tax, ac- more forced to move towards electronic publishing. counting professionals) have to be adapted in order to Possibilities for consuming information are changing provide this high value content. and so do the expectations of customers. For instance, Let’s motivate our work in more detail with a real digital work environments offer new functionalities world user scenario: Gerhard, an accounting profes- (e.g. non-linear story telling, inclusion of background sional works for TWC, a leading tax and accounting consultancy. He is responsible for certifying the value- *Corresponding author. E-mail: [email protected] added-tax (VAT) returns at the Europe-wide operat- leipzig.de. ing food retailer named Aldo (customer of the tax and 1570-0844/16/$27.50 c 2016 – IOS Press and the authors. All rights reserved 2 Dirschl et al. / Facilitating WKD Data Flows using the Linked Data Stack accounting consultancy). For Gerhard it is crucial, to lishing and electronic usage of documents challenges track all changes of VAT regulations in all the coun- (the document centric usage of) this technology. Elec- tries Aldo operates in, which might include countries tronic publishing requires separation of the core con- in the Euro zone other EU member states and a few tent (represented as XML) and metadata, smaller par- neighboring countries. As a result, Gerhard needs to titioning of the content and the ability to included life be informed, whenever a law in one of these countries external information. related to VAT regulations is changed. In addition, he Semantic technologies can help to support these has to track court decisions of all cases related to VAT processes. Publishers still deal with large amounts of regulations. Because Aldo has to update its ERP (en- unstructured, textual content. Knowledge extraction terprise resource planning) systems when VAT regu- approaches can help to annotate and enrich such con- lations change, Gerhard wants to notify Aldo’s IT de- tent. Once formalized knowledge (e.g. adhering to the partment already proactively as early as possible, when RDF data model) is extracted it needs to be stored, major regulatory changes (e.g. increase of the VAT for managed and made available for querying. Links to certain products in a certain country) are planned. Cur- other knowledge bases, either from the publisher it- rently, Gerhard and his team have to track a vast num- self, from other content providers or from the Web of ber of textual sources provided to TWC by a global Data, need to be established. We can apply reasoning publisher and several smaller regional publishers spe- and machine learning techniques to enrich the knowl- cialized in the legal, accounting and tax domains for edge bases with ontological structures. Since the orig- relevant legislation and regulatory changes. In future, inal documents might change (like a new version of the global publisher aims to deliver Gerhard and his a law), we need processes for maintaining and further colleagues at TWC much more personalized and con- developing the extracted knowledge. Finally, semantic text specific information pieces fulfilling exactly his search, exploration and visualization techniques can information need. Gerhard aims to register the sources help to gain new insights from the semantically repre- (legislation in certain countries, court decisions and sented content. The Linked Data Stack provides spe- parliamentary initiatives) he wants to track and fil- cialized tools for each of these lifecycle stages and can ter information according to entries in a taxonomy re- consequently be used to facilitate the semantic content lated to VAT. Subsequently, Gerhard wants to be noti- processing workflows at a publisher. fied whenever a certain piece of information related to The application report is structured as follows: In his particular information need is published by one of Section 2, we present the Linked Data Stack architec- the identified sources. He wants to easily compare the ture on a high level. After that, the vision of the vi- changes applied to a certain law, for example, and be sion of the Linked Data lifecycle is explained in Sec- able to explore specifiably related court decisions. tion 3. The phases of this lifecycle are closely related Such a scenario requires an interaction and data ex- to the data-flows at global publishers, which are de- change between different companies as well as sev- scribed in Section 4. Based on this, in Section 5, we eral intelligent systems to process data as well as pos- describe how the Linked Data Stack was applied at a sibly enrich and interlink it. Single tools cannot solve particular publisher – Wolters Kluwer. We follow up those problems in isolation at large scale. Several inter- with a brief summary of related work in Section 6 and operable components based on standards are required plans for future work in Section 7. to achieve this. In this application report, we describe how Linked Data and tools from the Linked Data Stack can address those challenges. This application report 2. Overview of the Linked Data Stack builds on [1]. While the previous article focused on a detailed characterisation of the Linked Data Stack, this The description of the Linked Data Stack (formaly application report focuses on a detailed description of known as the LOD2 stack) and the Linked Data lifecy- the usage scenarios at a global publisher. cle (see Figure 1) are extensions of earlier work in [1] In the past publishers were the driving force be- and [3]. The Linked Data Stack is an integrated distri- hind document representation formats as SGML and bution of components, which support the whole life- XML. Using that technology usually a document cen- cycle of Linked Data from extraction, authoring/cre- tric content manament system is established. Today ation via enrichment, interlinking, fusing to explo- this technology is still a corner stone of many publish- ration. The Linked Data stack is available at http:// ers however the aforementioned rise in electronic pub- stack.linkeddata.org The major components Dirschl et al. / Facilitating WKD Data Flows using the Linked Data Stack 3 of the Linked Data Stack are open-source facilitating a wide deployment potential. Through an iterative software development approach, the stack contributors aim at ensuring that the stack fulfills a broad set of user requirements and thus facilitates the transition to a Web of Data. The stack is designed to be versatile: by exploiting the Linked Data (RDF, SPARQL, OWL) paradigm as the main application interface the plug- ging in of alternative (third-party) implementations is enabled. In order to fulfill these requirements, the architec- ture of the Linked Data Stack is based on three pillars: 1. Software integration and deployment using the Debian packaging system: The Debian packaging system is one of the most widely used pack- Fig. 1. Stages of the Linked Data life-cycle supported by the Linked aging and deployment infrastructures and facili- Data Stack. tates packaging and integration as well as main- already developed functionalities in other appli- tenance of dependencies between the various Linked Data Stack components. Using the De- cations. bian system also allows to facilitate the deploy- These three pillars comprise the methodological and ment of the Linked Data Stack on individual technological framework for integrating the very het- servers, cloud or virtualization infrastructures. erogeneous Linked Data Stack components into a con- 2. The use of RDF as the data representation for- sistent framework. mat and SPARQL as the data exchange mecha- nism creates a uniform and universal knowledge exchange bus.

Load more