Automatisierte Layoutformatierung Für Printmedien Mit CSS (PDF)

Total Page:16

File Type:pdf, Size:1020Kb

Automatisierte Layoutformatierung Für Printmedien Mit CSS (PDF) Ruprecht-Karls-Universität Neuphilologische Fakultät Institut für Computerlinguistik Bachelorarbeit 23.03.2017 Automatisierte Layoutformatierung für Printmedien mit CSS „CSS paged media module level 3“ in der Anwendung im Vgl. zu XSL-FO Erstbetreuer: Prof. Dr. Andreas Witt Zweitbetreuer: Manuel Montero Pineda Zweitgutachterin: Prof. Dr. Katja Markert Svenja Lohse (7. Semester) HF: Computerlinguistik (75%) Römerstr. 131 NF: Bildungswissenschaft (25%) 69126 Heidelberg Tel: 0162 8620613 Email: [email protected] Abstract Books and other printed media are still a favorite way to save and exchange knowledge and thoughts. These contents have to get enriched with meta data to be logically, usefully and clearly stored and representated. Through syntactical and semantical connections linguistic meta data enables specific analysis, extraction and processing amounts of data. Also the layout of texts, informations etc. works with these meta data and is important for understanding the content. Cross media publishing is interested in designing layouts for different output media but with the same or few programming languages. This thesis explores empirically the work ability of CSS level 3 (CSS3), especially CSS paged media module level 3, for creation of print layouts in practice in contrast to XSL-FO. How is it working and what is CSS3 missing to get a more professional result? Data base is MARC-XML from the „Deutsche Nationalbibliothek” datashop, which is processed by XSLT to a HTML5 document. On this CSS3 properties, grouped by typographical concepts, are tried out rendered by Antenna House Formatter. The subject combination of computer science (XML technologies), publishing (layout, cross media publishing) and linguistics (enrichment of content with markup languages) makes further research and developement of automatical CSS layout processing for printed media interesting. The study comes to the conclusion that CSS3 has a lot of things to improve before managing professional print layouts like XSL-FO does. But simple documents and simple, mean formatting requests are executed easily and fastly. The person developing and controlling CSS stylesheets doesn’t have to be a professional layout designer to get decent output. More browser compatibility would be helpful so more user could test scenarios (like „digital first” process, from website to print) and give feedback. This could be used for evaluation and advancement of CSS3. In conclusion, CSS3 is improvement-needy but also upgradeable and suitable for future of professional print layout creation. 2 Inhaltsverzeichnis Abkürzungsverzeichnis ................................................................................................................. 5 Typografische Konventionen ........................................................................................................ 7 1 Einleitung .............................................................................................................................. 8 1.1 Ausgangssituation und Herausforderung ...................................................................... 8 1.2 Ziel und Vorgehen ....................................................................................................... 10 1.3 Motivation für die Computerlinguistik ........................................................................ 10 2 Daten und Vorprozess ......................................................................................................... 11 2.1 Metadaten in XML ...................................................................................................... 11 2.2 Ausgangsformat MARC-XML und andere linguistische Metadatenformate .............. 12 2.2.1 MARC21 und MARC-XML .................................................................................. 12 2.2.2 Dublin Core ......................................................................................................... 14 2.2.3 IMDI .................................................................................................................... 15 2.2.4 TEI ....................................................................................................................... 15 2.2.5 CMDI ................................................................................................................... 16 2.3 HTML5 als Zielformat ................................................................................................ 17 2.4 Überführung/Transformation der Ausgangsdaten mit XSLT ...................................... 18 3 Überblick XSL-FO und CSS ............................................................................................... 19 3.1 XSL-FO ....................................................................................................................... 19 3.2 CSS(3) ......................................................................................................................... 20 3.3 Vergleichbarkeit .......................................................................................................... 22 3.4 Formatiererwahl .......................................................................................................... 23 4 Umsetzung typografischer Konzepte .................................................................................. 24 4.1 Grundstruktur .............................................................................................................. 25 4.1.1 XSL-FO ................................................................................................................ 25 4.1.2 CSS [CSS3PAGE] ............................................................................................... 26 4.1.3 Fazit Abschnitt ..................................................................................................... 28 4.2 Seitenformatierung ...................................................................................................... 28 4.2.1 Mehrspaltensatz ................................................................................................... 28 4.2.2 Umbrucheinstellungen ......................................................................................... 30 4.2.3 Seitenvorlagen ..................................................................................................... 35 4.2.4 Selektoren ............................................................................................................ 38 4.3 Inhaltstext .................................................................................................................... 39 4.3.1 Absatz- und Zeilenformatierung – Blöcke, Inline und Inline-Blöcke ................. 40 4.3.2 Wort- und Zeichenformatierung .......................................................................... 45 3 4.4 Weitere Inhaltselemente .............................................................................................. 51 4.4.1 Listen ................................................................................................................... 51 4.4.2 Tabellen ............................................................................................................... 54 4.4.3 Abbildungen ........................................................................................................ 57 4.4.4 Zwischenfazit ...................................................................................................... 61 4.5 Automatisch generierte Elemente................................................................................ 62 4.5.1 Fußnoten .............................................................................................................. 62 4.5.2 Marginalien ......................................................................................................... 65 4.5.3 Verweise .............................................................................................................. 74 4.5.4 Exkurs: Lesezeichen ............................................................................................ 82 4.5.5 Vorbereitung Druck ............................................................................................. 83 5 Lösungsmöglichkeiten und Ausblicke ................................................................................ 85 5.1 Vorteile und Schranken von CSS ................................................................................ 85 5.2 Bestehende Erweiterungsmöglichkeiten ..................................................................... 86 5.3 Ausblick ...................................................................................................................... 86 6 Schlussfolgerung/Bilanz ...................................................................................................... 87 7 Quellen ................................................................................................................................ 89 7.1 Printwerke ................................................................................................................... 89 7.2 Internetquellen ............................................................................................................. 89 7.3 Abbildungsquellen....................................................................................................... 95 8 Anhang ................................................................................................................................ 96 8.1 Codes ..........................................................................................................................
Recommended publications
  • Publishing Pdfs from DITA
    Publishing PDFs from DITA Benefits and Limitations of XSL‐FO Benefits Introduction Low entry cost XSL‐FO is an XML language Scott Prentice, President of Leximation, Inc. FO markup language is largely based on CSS Specializing in FrameMaker plugin development as well as Designed to work for all written human languages structured FrameMaker conversions, consulting, and development. FrameMaker user/developer since 1991. Limitations Developed DITA‐FMx, a FrameMaker plugin for efficient XSL‐FO coding is expensive and complicated DITA authoring and publishing. Tables that span pages may not break as expected Developer of custom Help systems and creative/functional No way to render elements on one page in relation to web applications. another page Coined the term “A I R Help” in 2007 after learning about Difficult to apply balanced vertical spacing on a page Adobe’s new AIR technology. Processors may use extensions to implement features, so the Interested in creating innovative ways to provide user FO stylesheets may not be portable between processors assistance that is actually used. Apache FOP What is DITA (briefly) Open Source / Free DITA ‐ Darwin Information Typing Architecture Java application, runs on ʺanyʺ platform An XML format for authoring in a topic oriented structure Default PDF output option provided with the DITA‐OT Highly reusable and modular Rendering engine uses XSL‐FO DITA topics can be easily rearranged and reused for Requires XSL‐FO developer to maintain stylesheets different deliverable types Supports
    [Show full text]
  • Xmlmind XSL Utility - Online Help
    XMLmind XSL Utility - Online Help Hussein Shafie, XMLmind Software <[email protected]> February 23, 2021 Table of Contents 1. Overview ................................................................................................................................ 1 2. Running XMLmind XSL Utility ............................................................................................... 3 2.1. System requirements ..................................................................................................... 3 2.2. Installation ................................................................................................................... 3 2.3. Contents of the installation directory .............................................................................. 3 2.4. Starting XMLmind XSL Utility ..................................................................................... 4 2.5. XMLmind XSL Utility as a command-line tool .............................................................. 5 3. Converting an XML document to another format ....................................................................... 6 3.1. Canceling the current conversion process ....................................................................... 7 4. Changing the parameters of a conversion .................................................................................. 8 5. Specifying a conversion ......................................................................................................... 10 5.1. Specifying the conversion
    [Show full text]
  • Markup UK 2021 Proceedings
    2021 Proceedings A Conference about XML and Other Markup Technologies Markup UK 2021 Proceedings 2 Markup UK 2021 Proceedings 3 Markup UK 2021 Proceedings Markup UK Sister Conferences A Conference about XML and Other Markup Technologies https://markupuk.org/ Markup UK Conferences Limited is a limited company registered in England and Wales. Company registration number: 11623628 Registered address: 24 Trimworth Road, Folkestone, CT19 4EL, UK VAT Registration Number: 316 5241 25 Organisation Committee Geert Bormans Tomos Hillman Ari Nordström Andrew Sales Rebecca Shoob Markup UK 2021 Proceedings Programme Committee by B. Tommie Usdin, David Maus, Syd Bauman – Northeastern University Alain Couthures, Michael Kay, Erik Digital Scholarship Group Siegel, Debbie Lapeyre, Karin Bredenberg, Achim Berndzen – <xml-project /> Jaime Kaminski, Robin La Fontaine, Abel Braaksma – Abrasoft Nigel Whitaker, Steven Pemberton, Tony Peter Flynn – University College Cork Graham and Liam Quin Tony Graham – Antenna House Michael Kay – Saxonica The organisers of Markup UK would like to Jirka Kosek – University of Economics, thank Antenna House for their expert and Prague unstinting help in preparing and formatting Deborah A. Lapeyre – Mulberry the conference proceedings, and their Technologies generosity in providing licences to do so. David Maus – State and University Library Hamburg Antenna House Formatter is based on the Adam Retter – Evolved Binary W3C Recommendations for XSL-FO and B. Tommie Usdin – Mulberry Technologies CSS and has long been recognized as Norman Walsh – MarkLogic the most powerful and proven standards Lauren Wood – XML.com based formatting software available. It is used worldwide in demanding applications Thank You where the need is to format HTML and XML into PDF and print.
    [Show full text]
  • XSL-FO by Dave Pawson Publisher
    XSL-FO By Dave Pawson Publisher : O'Reilly Pub Date : August 2002 ISBN : 0-596-00355-2 Pages : 282 Table of • Contents • Index • Reviews Reader • Reviews Extensible Style Language-Formatting Objects, or XSL-FO, is a set of tools developers and web designers use to describe page printouts of their XML (including XHTML) documents. XSL-FO teaches you how to think about the formatting of your documents and guides you through the questions you'll need to ask to ensure that your printed documents meet the same high standards as your computer-generated content. 777 Copyright Preface Who Should Read This Book? What Does This Book Cover? Motivation Organization of This Book What Else Do You Need? Conventions Used in This Book How to Contact Us Acknowledgments Chapter 1. Planning for XSL-FO Section 1.1. XML and Document Processing Section 1.2. Choosing Your Print Production Approach Section 1.3. Choosing Tools Section 1.4. The Future for XSL-FO Chapter 2. A First Look at XSL-FO Section 2.1. An XSL-FO Overview Section 2.2. Related Stylesheet Specifications Section 2.3. Using XSL-FO as Part of XSL Section 2.4. Shorthand, Short Form, and Inheritance Chapter 3. Pagination Section 3.1. Document Classes Section 3.2. The Main Parts of an XSL-FO Document Section 3.3. Simple Page Master Section 3.4. Complex Pagination Section 3.5. Page Sequences Chapter 4. Areas Section 4.1. Informal Definition of an Area Section 4.2. Area Types Section 4.3. Components of an Area Section 4.4.
    [Show full text]
  • DITA Open Toolkit 2.0 This Document Describes the DITA Open Toolkit Project—What the Project Is, and How to Use the Site
    DITA Open Toolkit 2.0 This document describes the DITA Open Toolkit project—what the project is, and how to use the site. What is the DITA Open Toolkit? The DITA Open Toolkit, or DITA-OT for short, is a set of Java-based, open source tools that provide processing for DITA maps and topic content. You can download the OT and install it for free on your computer to get started with topic-based writing and publishing. The DITA-OT is licensed under the CPL 1.0 and Apache 2.0 open source licenses. Note: While the DITA Standard itself is owned and developed by OASIS, the DITA Open Toolkit is an independent, open source implementation of the standard. Key output formats for the toolkit include: XHTML PDF (formerly known as PDF2) ODT (Open Document Format) Eclipse Help TocJS (XHTML with a JavaScript frameset) HTML Help Java Help Eclipse Content (normalized DITA plus Eclipse project files) Word RTF (with some limitations) Docbook Troff Toolkit documentation There are two primary sources for documentation about the toolkit. Stable documentation about toolkit usage, parameters, and project management can be found on this page, using the navigation panel on the left. New information about the latest toolkit builds, plans for the next release, and other changing information can be found on the DITA-OT landing page at the dita.xml.org site (link below). That site also contains the release notes for all upcoming and previous releases. Related concepts Distribution packages Related information Main DITA-OT page at dita.xml.org Project News for DITA Open Toolkit Shortcuts to important information DITA-OT stable release DITA-OT latest development build Getting Started with the DITA Open Toolkit The Getting Started Guide is designed to provide a guided exploration of the DITA Open Toolkit.
    [Show full text]
  • Xmlmind DITA Converter Manual
    XMLmind DITA Converter Manual Hussein Shafie XMLmind Software 35, rue Louis Leblanc 78120 Rambouillet France Phone: +33 (0)9 52 80 80 37 [email protected] www.xmlmind.com/ditac/ May 14, 2021 XMLmind DITA Converter Manual Table of Contents List of Figures ............................................................................................................................................. ii List of Tables ............................................................................................................................................. iii Introduction ................................................................................................................................................. iv Part I. Using XMLmind DITA Converter .............................................................................. 1 Chapter 1. Installing XMLmind DITA Converter ......................................................................... 2 1. Contents of the installation directory ...................................................................................... 3 Chapter 2. Getting started ................................................................................................................ 6 1. Using the ditac command-line utility ...................................................................................... 6 Chapter 3. The ditac command-line utility ................................................................................... 14 Chapter 4. XSLT stylesheets parameters .....................................................................................
    [Show full text]
  • Ntent Copyright © 2011 Anthony Self
    Content copyright © 2011 Anthony Self. This content is licensed under the Creative Com- mons Attribution-Noncommercial 3.0 Unported License (http://creativecommons.org/ licenses/by-nc/3.0/) with exceptions noted in the following paragraphs. Contributed samples are owned by their creators, who are credited in the text. Cover design copyright © 2011 Scriptorium Publishing Services, Inc. Scriptorium Press, Scriptorium Publishing Services, and the Scriptorium Press logo are trademarks of Scripto- rium Publishing Services, Inc. All other trademarks used herein are the properties of their respective owners and are used for identification purposes only. Every effort was made to ensure this book is accurate. However, the author and Scriptorium Publishing Services assume no responsibility for errors or omissions, or for use of information in this book. The author and Scriptorium Publishing Services assume no responsibility for third-party content referenced in this book. Published by Scriptorium Press, the imprint of Scriptorium Publishing Services, Inc. For information, contact: Scriptorium Publishing Services, Inc. PO Box 12761 Research Triangle Park, NC 27709-2761 USA Attn: Scriptorium Press www.scriptorium.com/books [email protected] ISBN: 978-0-9828118-1-8 Cover design by Alan S. Pringle and David J. Kelly Index by Tony Self The author developed the source files for the content in DITA. With the exception of the title page and copyright notice, page layout in this book is from Scriptorium’s DITA PDF plugin processed by the Antenna
    [Show full text]
  • Oxygen XML Author Eclipse Plugin 17.1
    Oxygen XML Author Eclipse Plugin 17.1 Notice Copyright Oxygen XML Author plugin User Manual Syncro Soft SRL. Copyright © 2002-2015 Syncro Soft SRL. All Rights Reserved. All rights reserved. No parts of this work may be reproduced in any form or by any means - graphic, electronic, or mechanical, including photocopying, recording, taping, or information storage and retrieval systems - without the written permission of the publisher. Products that are referred to in this document may be either trademarks and/or registered trademarks of the respective owners. The publisher and the author make no claim to these trademarks. Trademarks. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this document, and Syncro Soft SRL was aware of a trademark claim, the designations have been rendered in caps or initial caps. Notice. While every precaution has been taken in the preparation of this document, the publisher and the author assume no responsibility for errors or omissions, or for damages resulting from the use of information contained in this document or from the use of programs and source code that may accompany it. In no event shall the publisher and the author be liable for any loss of profit or any other commercial damage caused or alleged to have been caused directly or indirectly by this document. Link disclaimer. Syncro Soft SRL is not responsible for the contents or reliability of any linked Web sites referenced elsewhere within this documentation, and Syncro Soft SRL does not necessarily endorse the products, services, or information described or offered within them.
    [Show full text]
  • DITA Open Toolkit 1.8.4 This Document Describes the DITA Open Toolkit Project—What the Project Is, and How to Use the Site
    DITA Open Toolkit 1.8.4 This document describes the DITA Open Toolkit project—what the project is, and how to use the site. What is the DITA Open Toolkit? The DITA Open Toolkit, or DITA-OT for short, is a set of Java-based, open source tools that provide processing for DITA maps and topic content. You can download the OT and install it for free on your computer to get started with topic-based writing and publishing. The DITA-OT is licensed under the CPL 1.0 and Apache 2.0 open source licenses. Note: While the DITA Standard itself is owned and developed by OASIS, the DITA Open Toolkit is an independent, open source implementation of the standard. Key output formats for the toolkit include: XHTML PDF (formerly known as PDF2) ODT (Open Document Format) Eclipse Help TocJS (XHTML with a JavaScript frameset) HTML Help Java Help Eclipse Content (normalized DITA plus Eclipse project files) Word RTF (with some limitations) Docbook Troff Toolkit documentation There are two primary sources for documentation about the toolkit. Stable documentation about toolkit usage, parameters, and project management can be found on this page, using the navigation panel on the left. New information about the latest toolkit builds, plans for the next release, and other changing information can be found on the DITA-OT landing page at the dita.xml.org site (link below). That site also contains the release notes for all upcoming and previous releases. Related concepts Distribution packages Related information Main DITA-OT page at dita.xml.org Project News for DITA Open Toolkit Shortcuts to important information DITA-OT stable release DITA-OT latest development build Getting Started with the DITA Open Toolkit The Getting Started Guide is designed to provide a guided exploration of the DITA Open Toolkit.
    [Show full text]
  • XML Prague 2016
    XML Prague 2016 Conference Proceedings University of Economics, Prague Prague, Czech Republic February 11–13, 2016 XML Prague 2016 – Conference Proceedings Copyright © 2016 Jiří Kosek ISBN 978-80-906259-0-7 (pdf) ISBN 978-80-906259-1-4 (ePub) Table of Contents General Information ..................................................................................................... vii Sponsors .......................................................................................................................... ix Preface .............................................................................................................................. xi Born accessible EPUB – Romain Deltour ....................................................................... 1 Extending CSS with XSL-FO, XSL-FO with CSS – Tony Graham ............................ 17 Virtual Document Management – Ari Nordström ..................................................... 33 Define and Conquer – Dr. Patrik Stellmann ................................................................ 49 Subjugating Data Flow Programming – R. Alexander Miłowski and Norman Walsh ................................................................... 65 Schematron QuickFix – Octavian Nadolu and Nico Kutscherauer ............................. 81 Validating office documents in the publishing production workflow – Andrew Sales ................................................................................................................... 99 Data Just Wants to Be Format-Neutral – Steven Pemberton
    [Show full text]
  • Antenna House Regression Testing System User Guide V1.5
    Antenna House Regression Testing System User Guide V1.5 © 2013-2018 Antenna House, Inc. All Rights Reserved About this Publication This publication was created by Michael Hougentogler in Greenville, DE. It was de‐ signed and edited to serve as an example of the capabilities of Antenna House’s “Cloud Authoring Service for the Universal Book” (CAS-UB). Aside from the cover and some additional graphical elements created using Adobe Photoshop, the author limited himself to the features and built-in style sheets inherent to CAS-UB. About the Antenna House Regression Testing System The Antenna House Regression Testing System (AHRTS) is a fast and scalable solution for automating the visual comparison of formatted documents or pages. It is not a solu‐ tion for tracking changes or locating variations in the content of multiple versions of the same document—its main function is to identify the visual dissimilarities that exist be‐ tween a pair or set of documents. That being said, AHRTS does have some capability to locate content discrepancies within a limited set of parameters. AHRTS was originally developed to meet Antenna House’s requirement for regression testing new releases of our Formatter software. We are now making AHRTS available commercially to meet the needs of customers who must also perform regression testing of their output whenever any aspect of their production process has been altered. This in‐ cludes not only the output tool, but also the editing, content management and graphic tools that are used. A slight change in any of the soft- and/or hardware of a system could have unintentional consequences for the integrity of the final output—text, graphics and tables could be altered unexpectedly.
    [Show full text]
  • Conference Proceedings
    XML LONDON 2014 CONFERENCE PROCEEDINGS UNIVERSITY COLLEGE LONDON, LONDON, UNITED KINGDOM JUNE 7–8, 2014 XML London 2014 – Conference Proceedings Published by XML London Copyright © 2014 Charles Foster ISBN 978-0-9926471-1-7 Table of Contents General Information. 7 Sponsors. 8 Preface. 9 Benchmarking XSLT Performance - Michael Kay and Debbie Lockett. 10 Streaming Design Patterns or: How I Learned to Stop Worrying and Love the Stream - Abel Braaksma. 24 From monolithic XML for print/web to lean XML for data: realising linked data for dictionaries - Matt Kohl, Sandro Cirulli and Phil Gooch. 53 XML Processing in Scala - Dino Fancellu and William Narmontas. 63 XML Authoring On Mobile Devices - George Bina. 76 Engineering a XML-based Content Hub for Enterprise Publishing - Elias Weingärtner and Christoph Ludwig. 83 A Visual Comparison Approach to Automated Regression Testing - Celina Huang. 88 Live XML Data - Steven Pemberton. 96 Schematron - More useful than you’d thought - Philip Fennell. 103 Linked Data in a .NET World - Kal Ahmed. 113 Frameless for XML - The Reactive Revolution - Robbert Broersma and Yolijn van der Kolk. 128 Product Usage Schemas - Jorge Luis Williams. 133 An XML-based Approach for Data Preprocessing of Multi-Label Classification Problems - Eduardo Corrêa Gonçalves and Vanessa Braganholo. 148 Using Abstract Content Model and Wikis to link Semantic Web, XML, HTML, JSON and CSV - Lech Rzedzicki. 152 JSON and XML: a new perspective - Eric van der Vlist. 157 MarkLogic- DataLeadershipEvent-A4-v01b.pdf 1 11/21/2013 3:29:18 PM Are you stuck xing YESTERDAY Or are you solving for TOMORROW? The world’s largest banks use MarkLogic to get a 360-degree view of the enterprise, reduce risk and operational costs, and provide better customer service.
    [Show full text]