XML Prague 2018 – Conference Proceedings Copyright © 2018 Jiří Kosek

Total Page:16

File Type:pdf, Size:1020Kb

XML Prague 2018 – Conference Proceedings Copyright © 2018 Jiří Kosek XML Prague 2018 Conference Proceedings University of Economics, Prague Prague, Czech Republic February 8–10, 2018 XML Prague 2018 – Conference Proceedings Copyright © 2018 Jiří Kosek ISBN 978-80-906259-4-5 (pdf) ISBN 978-80-906259-5-2 (ePub) X-definition X-definition gggggcuments lGdefinitionlllllllllllllll documents. lllllllllllstand lllllllllllllllta lllllllprehenlllGdefinitionllllllll lllllllllllctures. Validationgggtruction lGdefinitionlllllll)llll lllllllGlllllnguage. lllllllllllllll llllllllllllort lllllllllllocessing gggerconnecg lGdefinitionllllconnectionl lllllllllllllllll llllllllxGmponexllllllll llllllXMLlllllllllIlllll" Connecggtabasggg llGllllllIlll llll"llllllll ggenanceggg lllllGdefinitionllllllll lllllllllllIl" ggggw.syntea.cz Table of Contents General Information ..................................................................................................... vii Sponsors .......................................................................................................................... ix Preface .............................................................................................................................. xi Assisted Structured Authoring using Conditional Random Fields – Bert Willems ...................................................................................................................... 1 XML Success Story: Creating and Integrating Collaboration Solutions to Improve the Documentation Process – Steven Higgs ............................................... 13 xqerl: XQuery 3.1 Implementation in Erlang – Zachary N. Dean ............................ 23 XML Tree Models for Efficient Copy Operations – Michael Kay ............................ 33 Using Maven with XML development projects – Christophe Marchand and Matthieu Ricaud-Dussarget ................................................. 49 Varieties of XML Merge: Concurrent versus Sequential – Tejas Pradip Barhate and Nigel Whitaker ....................................................................... 61 Including XML Markup in the Automated Collation of Literary Text – Elli Bleeker, Bram Buitendijk, Ronald Haentjens Dekker, and Astrid Kulsdom ............. 77 Multi-Layer Content Modelling to the Rescue – Erik Siegel .................................... 97 Combining graph and tree – Hans-Juergen Rennau ................................................ 107 SML – A simpler and shorter representation of XML – Jean-François Larvoire ... 137 Can we create a real world rich Internet application using Saxon-JS? – Pieter Masereeuw ........................................................................................................... 157 Implementing XForms using interactive XSLT 3.0 – O'Neil Delpratt and Debbie Lockett .............................................................................. 167 Life, the Universe, and CSS Tests – Tony Graham ................................................... 181 Form, and Content – Steven Pemberton ..................................................................... 213 tokenized-to-tree – Gerrit Imsieke .............................................................................. 229 v vi General Information Date February 8th, 9th and 10th, 2018 Location University of Economics, Prague (UEP) nám. W. Churchilla 4, 130 67 Prague 3, Czech Republic Organizing Committee Petr Cimprich, XML Prague, z.s. Vít Janota, Xyleme & XML Prague, z.s. Káťa Kabrhelová, XML Prague, z.s. Jirka Kosek, xmlguru.cz & XML Prague, z.s. & University of Economics, Prague Martin Svárovský, Memsource & XML Prague, z.s. Mohamed Zergaoui, ShareXML.com & Innovimax Program Committee Robin Berjon, The New York Times Petr Cimprich, Xyleme Jim Fuller, MarkLogic Michael Kay, Saxonica Jirka Kosek (chair), University of Economics, Prague Ari Nordström, SGMLGuru.org Uche Ogbuji, Zepheira LLC Adam Retter, Evolved Binary Andrew Sales, Bloomsbury Publishing plc Felix Sasaki, Cornelsen GmbH John Snelson, MarkLogic Jeni Tennison, Open Data Institute Eric van der Vlist, Dyomedea Priscilla Walmsley, Datypic Norman Walsh, MarkLogic Mohamed Zergaoui, Innovimax Produced By XML Prague, z.s. (http://​xmlprague.cz/​about) Faculty of Informatics and Statistics, UEP (http://​fis.vse.cz) vii viii Sponsors oXygen (http://​www.oxygenxml.com) le-tex publishing services (http://​www.le-tex.de/​en/) Antenna House (http://​www.antennahouse.com/) Saxonica (http://​www.saxonica.com/) speedata (http://​www.speedata.de/) Syntea (http://​www.syntea.cz/) ix x Preface This publication contains papers presented during the XML Prague 2018 confer- ence. In its thirteenth year, XML Prague is a conference on XML for developers, markup geeks, information managers, and students. XML Prague focuses on markup and semantic on the Web, publishing and digital books, XML technolo- gies for Big Data and recent advances in XML technologies. The conference pro- vides an overview of successful technologies, with a focus on real world application versus theoretical exposition. The conference takes place 8–10 February 2018 at the campus of University of Economics in Prague. XML Prague 2018 is jointly organized by the non-profit organization XML Prague, z.s. and by the Faculty of Informatics and Statistics, University of Economics in Prague. The full program of the conference is broadcasted over the Internet (see http:// xmlprague.cz)—allowing XML fans, from around the world, to participate on-line. The Thursday runs in an unconference style which provides space for various XML community meetings in parallel tracks. Friday and Saturday are devoted to classical single-track format and papers from these days are published in the pro- ceeedings. Additionally, we coordinate, support and provide space for XProc working group meeting collocated with XML Prague. We hope that you enjoy XML Prague 2018 – especially as this is a very special edition – the last day of the conference is 20th anniversary of XML Recommenda- tion publication. — Petr Cimprich & Jirka Kosek & Mohamed Zergaoui XML Prague Organizing Committee xi xii Assisted Structured Authoring using Conditional Random Fields Bert Willems FontoXML <[email protected]> Abstract Authoring structured content with rich semantic markup is repetitive, time consuming and error-prone. Many Subject Matter Experts (SMEs) struggle with the task of applying the correct markup. This paper proposes a mecha- nism to partially automate this using Conditional Random Fields (CRF), a machine learning algorithm. It also proposes an architecture on how to con- tinuously improve the CRF model in production using a feedback loop. Keywords: XML, Conditional Random Fields, Structured Authoring, Machine Learning 1. Introduction With the increasing adoption of structured XML content, the amount of work required from Subject Matter Experts (SMEs) increases. Not only are they required to capture their knowledge as information to others, they are increas- ingly asked, and sometimes even required, to mark up the information with the appropriate semantic and structural metadata in the form of XML tags and attrib- utes. Examples of those markup tasks include: • Structuring bibliographic references to tag authors, journal name, publisher etc. • Marking up tasks, not with ordered lists but with steps. • Marking up interactive questions, like multiple choice questions. Although WYSIWYG XML editors help to make this task as easy as possible, the fact remains that there is additional work to be done that is often repetitive and error-prone. FontoXML conducted multiple studies to determine whether the effort of manual tagging affected adoption. The results showed a consistent nega- tive effect: SMEs and their editorial colleagues are hesitant to adopting structured authoring. In some cases this meant reverting back to their unstructured content processes, leading to unrealized potential. Prior implementations, like GROBID [3], apply markup automatically. This paper proposes to introduce Machine Learning (ML) to the authoring process 1 Assisted Structured Authoring using Conditional Random Fields instead. The reason for this is the inaccuracy of the state-of-the-art ML algo- rithms: like humans, they make mistakes. Allowing SMEs to (correct and) accept a machine provided suggestion will result in a more accurate markup. Further- more, this approach allows for the creation of a feedback loop, allowing the machine to improve over time. This paper focuses on the task of structuring bibliographic citations, although the proposed architecture scales to many of the tasks required for properly struc- tured content. 2. Model This section describes the model used for recognizing bibliographic citations and extracting the relevant labels from it. The model used in this paper follows a divide-and-conquer strategy and is made up out of two separate models: The Citation Model and the Name Model. Partial results from the Citation Model cas- cade into the Name model to more detailed results. 2.1. Citation Model The goal of the Citation Model is to classify a sequence of text with tags that make up the parts of the citation. The tags are derived from the TEI P5 vocabulary [12] and are encoded using the IOB tagging scheme [8]. The following tags are distinguished: • author • orgName • editor • publisher • pubPlace • date • idno (bibliographic identifier) • analytic (articles, poems, etc.) • monographic (books, single & multi volumes, etc.) • journal • series • unpublished • volume • issue • pages • chapter For example, the sequence Erickson, T. & Kellogg, W. A. "Social
Recommended publications
  • Xml Parsing with Dom4j
    University at Buffalo Hanifi Gunes Spring 2011 PhD Candidate Computer Science & Engineering CSE 586 - Distributed Systems hanifigu{at}buffalo{dot}edu xml parsing with dom4j In this tutorial we will talk about XML parsing with dom4j, an easy to use, open source library working with XML, XPath and XSLT on the Java platform. So let’s get started. ➀ Download and install eclipse for Java developers: http://www.eclipse.org Download the dom4j jar package: http://dom4j.sourceforge.net Download the jaxen jar package: http://jaxen.codehaus.org ➁ Run eclipse and locate your workspace directory if you are running it for the first time. Now, create a new Java Project, MyXMLReader, from File ➝ New ➝ Java Project. Notice that the project is created under the workspace directory {workspace_dir}/MyXMLReader. ➂ Right click on your project and select Build Path ➝ Add External Archives. Locate and add the dom4j jar archive from step 1 and repeat the same procedure to add the jaxen package to your build path. At that point, you should see these packages listed under Referenced Libraries as it is on the left snapshot. ➃ Now that, the project setup is done we can work with Yahoo! Weather. Visit http:// developer.yahoo.com/weather/ to get familiar with the XML response schema. First thing to observe is that you can query Yahoo Weather for a particular zip code using the following pattern http://weather.yahooapis.com/forecastrss?p=ZIPCODE. Try substituting your own zip code into the address pattern and then navigate to that address from your browser in order to inspect the XML response.
    [Show full text]
  • Product Description for Saxon- HE (Home Edition) Version 9.7 Released Nov 2015 Page 1/4
    Product Description for Saxon- HE (Home Edition) Version 9.7 released Nov 2015 Page 1/4 This document lists the features provided by Saxon 9.7 open source Home Edition (Saxon-HE). This document does not form part of any contract unless expressly incorporated. Language Support 1. XSLT (Transformation Processing) 1.1 XSLT 2.0 Provides a basic XSLT 2.0 processor as defined in section 21 of the (Basic) XSLT 2.0 Recommendation: it is a conformance level that includes all features of the language other than those that involve schema processing. For more details see: XSLT 2.0 conformance. Relevant W3C Specification: XSLT 2.0 Recommendation (23 January 2007). 2. XPath 2.1 XPath 2.0 Provides all XPath 2.0 features other than schema-awareness. (Basic) For more details see: XPath 2.0 conformance. Relevant W3C Specification: XPath 2.0 Recommendation (14 December 2010). 2.2 XPath 3.0 Provides all XPath 3.0 features other than schema-awareness and (Basic) higher-order functions. For more details see: XPath 3.0 conformance. Relevant W3C Specification: XPath 3.0 Recommendation (08 April 2014). 2.3 XPath 3.1 Provides all XPath 3.1 features other than schema-awareness and (Basic) higher-order functions. This includes an implementation of maps and arrays, and support for JSON. For more details see: XPath 3.1 conformance. Relevant W3C Specification: XPath 3.1 Candidate Recommendation (18 December 2014). Product Description for Saxon-HE (Home Edition) © Saxonica Ltd. 2017 Product Description for Saxon- HE (Home Edition) Version 9.7 released Nov 2015 Page 2/4 3.
    [Show full text]
  • A Personal Research Agent for Semantic Knowledge Management of Scientific Literature
    A Personal Research Agent for Semantic Knowledge Management of Scientific Literature Bahar Sateli A Thesis in the Department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy (Computer Science) at Concordia University Montréal, Québec, Canada February 2018 c Bahar Sateli, 2018 Concordia University School of Graduate Studies This is to certify that the thesis prepared By: Bahar Sateli Entitled: A Personal Research Agent for Semantic Knowledge Management of Scientific Literature and submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the final examining commitee: Chair Dr. Georgios Vatistas External Examiner Dr. Guy Lapalme Examiner Dr. Ferhat Khendek Examiner Dr. Volker Haarslev Examiner Dr. Juergen Rilling Supervisor Dr. René Witte Approved by Dr. Volker Haarslev, Graduate Program Director 9 April 2018 Dr. Amir Asif, Dean Faculty of Engineering and Computer Science Abstract A Personal Research Agent for Semantic Knowledge Management of Scientific Literature Bahar Sateli, Ph.D. Concordia University, 2018 The unprecedented rate of scientific publications is a major threat to the productivity of knowledge workers, who rely on scrutinizing the latest scientific discoveries for their daily tasks. Online digital libraries, academic publishing databases and open access repositories grant access to a plethora of information that can overwhelm a researcher, who is looking to obtain fine-grained knowledge relevant for her task at hand. This overload of information has encouraged researchers from various disciplines to look for new approaches in extracting, organizing, and managing knowledge from the immense amount of available literature in ever-growing repositories.
    [Show full text]
  • JAVA XML Interview Questions
    JJAAVVAA XXMMLL -- IINNTTEERRVVIIEEWW QQUUEESSTTIIOONNSS http://www.tutorialspoint.com/java_xml/java_xml_interview_questions.htm Copyright © tutorialspoint.com Dear readers, these JAVA based XML Parsing Interview Questions have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of JAVA based XML Parsing. As per my experience good interviewers hardly plan to ask any particular question during your interview, normally questions start with some basic concept of the subject and later they continue based on further discussion and what you answer − What XML stands for? XML stands for Extensible Markup Language. What are the advantages of using XML? Following are the advantages that XML provides − Technology agnostic - Being plain text, XML is technology independent. It can be used by any technology for data storage and transmission purpose. Human readable- XML uses simple text format. It is human readable and understandable. Extensible - in XML, custom tags can be created and used very easily. Allow Validation - Using XSD, DTD and XML structure can be validated easily. What are the disadvantages of using XML? Following are the disadvantages of XML usage − Redundant Syntax - Normally XML file contains lot of repeatitive terms. Verbose-Being a verbose language, XML file size increases the transmission and storage costs. What is XML Parsing? Parsing XML refers to going through XML document to access data or to modify data in one or other way. What is XML Parser? XML Parser provides way how to access or modify data present in an XML document. Java provides multiple options to parse XML document.
    [Show full text]
  • Designing of an Xpath Engine for P2P XML Store
    Designing of an XPath Engine for P2P XML Store Jovic, Darko; and Milutinovic, Veljko We can use XPath engine which works with Abstract— In the introductory part, this paper generic DOM and use it on top of P2P DOM defines the general environment for this research layer. This way we get full XPath functionality, but and defines the terms of interest. Then we define query execution is performed locally. All required the research problem: How to perform distributed node sets are pulled from the community and XPath query execution in a P2P environment. nodes matched against query to get the result. Existing solutions to the problem are briefly surveyed, and their drawbacks are underlined; Getting node sets from the community in case of each surveyed piece of research is analyzed large XML documents can cause increased according to the same template. Then, the essence communication among local peer and other of the proposed solution is presented: peers which contain node sets for specific implementation of an XPath engine which works in documents. High communication traffic can be a P2P environment and performs distributed avoided by distributing XPath query execution. query execution. The conclusion is from the point of view of performance/complexity ratio. 1. INTRODUCTION PATH is a language for addressing parts of X an XML document, designed to be used by both XSLT and XPointer [1][4]. XPath expressions are used to query XML data, express transformations and reference elements in remote documents. XPath is specification language designed by the World Wide Web Consortium (W3C) [6]. It is an expression language for addressing parts of an XML document, used by various XML Figure 1: Peer-to-Peer XML Storage System technologies such as XSL Transformation (XSLT) and XML Pointer Language (XPointer).
    [Show full text]
  • Semantic Web Technologies and Legal Scholarly Publishing Law, Governance and Technology Series
    Semantic Web Technologies and Legal Scholarly Publishing Law, Governance and Technology Series VOLUME 15 Series Editors: POMPEU CASANOVAS, Institute of Law and Technology, UAB, Spain GIOVANNI SARTOR, University of Bologna (Faculty of Law – CIRSFID) and European, University Institute of Florence, Italy Scientific Advisory Board: GIANMARIA AJANI, University of Turin, Italy; KEVIN ASHLEY, University of Pittsburgh, USA; KATIE ATKINSON, University of Liverpool, UK; TREVOR J.M. BENCH-CAPON, University of Liv- erpool, UK; V. RICHARDS BENJAMINS, Telefonica, Spain; GUIDO BOELLA, Universita’degli Studi di Torino, Italy; JOOST BREUKER, Universiteit van Amsterdam, The Netherlands; DANIÈLE BOUR- CIER, University of Paris 2-CERSA, France; TOM BRUCE, Cornell University, USA; NURIA CASEL- LAS, Institute of Law and Technology, UAB, Spain; CRISTIANO CASTELFRANCHI, ISTC-CNR, Italy; JACK G. CONRAD, Thomson Reuters, USA; ROSARIA CONTE, ISTC-CNR, Italy; FRAN- CESCO CONTINI, IRSIG-CNR, Italy; JESÚS CONTRERAS, iSOCO, Spain; JOHN DAVIES, British Telecommunications plc, UK; JOHN DOMINGUE, The Open University, UK; JAIME DELGADO, Uni- versitat Politécnica de Catalunya, Spain; MARCO FABRI, IRSIG-CNR, Italy; DIETER FENSEL, Uni- versity of Innsbruck, Austria; ENRICO FRANCESCONI, ITTIG-CNR, Italy; FERNANDO GALINDO, Universidad de Zaragoza, Spain; ALDO GANGEMI, ISTC-CNR, Italy; MICHAEL GENESERETH, Stanford University, USA; ASUNCIÓN GÓMEZ-PÉREZ, Universidad Politécnica de Madrid, Spain; THOMAS F. GORDON, Fraunhofer FOKUS, Germany; GUIDO GOVERNATORI, NICTA, Australia; GRAHAM
    [Show full text]
  • Xpath 2.0 in Context
    P1: KTX WY030-01 WY030-Kay WY030-Kay-v3.cls July 7, 2004 15:52 XPath 2.0 in Context This chapter explains what kind of language XPath is, and some of the design thinking behind it. It explains how XPath relates to the other specifications in the growing XML family, and to describe what’s new in XPath 2.0 compared with XPath 1.0. The chapter starts with an introduction to the basic concepts behind the language, its data model and the different kinds of expression it supports. This is followed by a survey of new features, since I think it’s likely that many readers of this book will already have some familiarity with XPath 1.0. I also introduce a few software products that you can use to try out these new features. The central part of the chapter is concerned with the relationships between XPath and other languages and specifications: with XSLT, with XML itself and XML namespaces, with XPointer, with XQuery, and with XML Schema. It also takes a look at the way XPath interacts with Java and with the various document object models (DOM and its variations). The final section of the chapter tries to draw out the distinctive features of the language, the things that make XPath different. The aim is to understand what lies behind the peculiarities of the language, to get an appreciation for the reasons (sometimes good reasons and sometimes bad) why the language is the way it is. Hopefully, with this insight, you will be able to draw on the strengths of the language and learn to skirt round its weaker points.
    [Show full text]
  • A Performance Based Comparative Study of Different Apis Used for Reading and Writing XML Files
    A Performance Based Comparative Study of Different APIs Used for Reading and Writing XML Files A Thesis submitted to the Graduate School at the University of Cincinnati In Partial Fulfillment of the requirements for the Degree of MASTER OF SCIENCE In the School of Electronics and Computing Systems of the College of Engineering and Applied Sciences By Neha Gujarathi Bachelor of Engineering (B. E.), 2009 University of Pune, India Committee Chair: Dr. Carla Purdy ABSTRACT Recently, XML (eXtensible Markup Language) files have become of great importance in business enterprises. Information in the XML files can be easily shared across the web. Thus, extracting data from XML documents and creating XML documents become important topics of discussion. There are many APIs (Application Program Interfaces) available which can perform these operations. For beginners in XML processing, selecting an API for a specific project is a difficult task. In this thesis we compare various APIs that are capable of extracting data and / or creating XML files. The comparison is done based on the performance time for different types of inputs which form different cases. The codes for all the different cases are implemented. Two different systems, one with Windows 7 OS and another with Mac OS are used to perform all the experiments. Using the results found we propose a suitable API for a given condition. In addition to the performance, programming ease for these APIs is taken into consideration as another aspect for comparison. To compare the programming ease, aspects such as number of lines of code, complexity of the code and complexity of understanding the coding for the particular API are considered.
    [Show full text]
  • XML Retrieval with Results Clustering on Android
    Proceedings of the 2012 2nd International Conference on Computer and Information Application (ICCIA 2012) XML Retrieval with Results Clustering on Android Pengfei Liu,Yanhua Chen,Wenjie Xie, Qiaoyi Hu Department of Mathematics South China Agricultural University Guangzhou, China [email protected], [email protected], [email protected],[email protected] Abstract—XML receives widely interests in data exchanging generating devices and can exchange data freely via and information management on both traditional desktop Bluetooth or WIFI, and then we can process data just by computing platforms and rising mobile computing platforms. mobile devices realtime. The idea of mining XML However, traditional XML retrieval does not work on mobile documents anywhere and anytime is very amazing but not devices due to the mobile platforms’ limitations and diversities. crazy. Clustering XML data is more complicated than common Considering that XML retrieval on mobile devices will text data as XML allows inserting structural and conceptual become increasingly popular, in this article, we have paid aspects into document content. An XML document includes attention to the design and implementation of XML retrieval tags and data; while tags describing names of elements and results clustering model on the android platform, building on jaxen and dom4j, the XML parser and retrieval engine; contain concepts as text data. Besides that, structure tags also furthermore, the K-means clustering algorithm. show the relationship between elements. Now, research work dedicated to XML document As an example of usage, we have tested the prototype on clustering mainly has three types, including structure-based some data sets to the mobile scenario and illustrated the method, content-based method and combination of both.
    [Show full text]
  • Proceedings of Balisage: the Markup Conference 2012
    Published in: Proceedings of Balisage: The Markup Conference 2012. at: http://www.balisage.net/Proceedings/vol6/print/Witt01/BalisageVol6- Witt01.html (downloaded on: 15.02.2017/14:19). Balisage: The Markup Conference 2012 Proceedings A standards-related web-based The Markup Conference M information system Maik Stührenberg Institut für Deutsche Sprache (IDS) Mannheim Universität Bielefeld <maik.Stuehrenberg0uni-bielefeld.de> Oliver Schonefeld Institut für Deutsche Sprache (IDS) Mannheim <[email protected]> Andreas Witt Institut für Deutsche Sprache (IDS) Mannheim <witt0 ids-mannheim.de> Bcilisage: The Markup Conference 2012 August 7 - 10, 2012 Copyright © 2012 by the authors. Used with permission. Howto cite this paper Stührenberg, Maik, Oliver Schonefeld and Andreas Witt. “A standards-related web-based information system.” Presented at Balisage: The Markup Conference 2012, Montreal, Canada, August 7-10, 2012. In Proceedings ofBalisage: The Markup Conference 2012. Balisage Series on Markup Technologies, vol. 8 (2012). DOI: io.4242/Baiisagevoi8.stuhrenbergoi. Abstract This late breaking proposal introduces the prototype of a web-based information system dealing with standards in the field of annotation. Table of Contents Introduction The problem with standards Providing guidance Information structure Representation Current state and future work Introduction This late breaking proposal is based on an ongoing effort started in the CLARIN project and which was presented briefly at the LREC 2012 Workshop on Collaborative Resource Development and Delivery. Initial point was the development of an easy to use information system for the description of standards developed in ISO/IEC TC37/SC4 Language Resources Management. Since these standards are heavily related to each other it is usually not feasible to adopt only a single standard for one's work but to dive into the standards jungle in full.
    [Show full text]
  • Java API for XML Processing (JAXP)
    Java API for XML Processing (JAXP) • API that provides an abstraction layer to XML parser implementations (specifically implementations of DOM and SAX), and applications that process Extensible Stylesheet Language Transformations (XSLT) • JAXP is is a layer above the parser APIs that makes it easier to perform some vendor-specific tasks in a vendor-neutral fashion. JAXP employs the Abstract Factory design pattern to provide a plugability layer, which allows you to plug in an implementation of DOM or SAX, or an application that processes XSLT • The primary classes of the JAXP plugability layer are javax.xml.parsers.DocumentBuilderFactory, javax.xml.parsers.SAXParserFactory, and javax.xml.transform.TransformerFactory. • Classes are abstract so you must ask the specific factory to create an instance of itself, and then use that instance to create a javax.xml.parsers.DocumentBuilder, javax.xml.parsers.SAXParser, or javax.xml.transform.Transformer, respectively. • DocumentBuilder abstracts the underlying DOM parser implementation, SAXParser the SAX parser implementation, and Transformer the underlying XSLT processor. DocumentBuilder, SAXParser, and Transformer are also abstract classes, so instances of them can only be obtained through their respective factory. JAXP Example - 1 import java.io.*; import javax.xml.*; import org.w3c.dom.Document; import org.xml.sax.SAXException; import javawebbook.sax.ContentHandlerExample; public class JAXPTest { public static void main(String[] args) throws Exception { File xmlFile = new File(args[0]); File
    [Show full text]
  • Towards the Unification of Formats for Overlapping Markup
    Towards the unification of formats for overlapping markup Paolo Marinelli∗ Fabio Vitali∗ Stefano Zacchiroli∗ [email protected] [email protected] [email protected] Abstract Overlapping markup refers to the issue of how to represent data structures more expressive than trees|for example direct acyclic graphs|using markup (meta-)languages which have been designed with trees in mind|for example XML. In this paper we observe that the state of the art in overlapping markup is far from being the widespread and consistent stack of standards and technologies readily available for XML and develop a roadmap for closing the gap. In particular we present in the paper the design and implementation of what we believe to be the first needed step, namely: a syntactic conversion framework among the plethora of overlapping markup serialization formats. The algorithms needed to perform the various conversions are presented in pseudo-code, they are meant to be used as blueprints for re- searchers and practitioners which need to write batch translation programs from one format to the other. 1 Introduction This paper is about markup, one of the key technological ingredient of hypertext. The particular aspects of markup we are concerned with are the limits of its expressivity. XML-based markup requires that the identified features of a document are organized hierarchically as a single tree, whereby each fragment of the content of the document is contained in one and only one XML element, each of which is contained within one and only one parent element all the way up to the single root element at the top.
    [Show full text]