!! #! " $ !" XML Document Processing: Theory and Practice

Total Page:16

File Type:pdf, Size:1020Kb

!! #! ïôU:žK‡öUÖæÙ XML Document Processing: Theory and Practice (Course Number: 725 U3420) Fall 2008 Place and Time Room 309 in Common Courses Building, Wednesday 14:30 – 17:20. Course Web Site http://www.iis.sinica.edu.tw/~trc/public/courses/Fall2008/ Instructor Š•^ (-.vb Ç Ñxv@ ová) Tyng–Ruey Chuang (Associate Research Fellow, Institute of Information Science, Academia Sinica) E–mail: [email protected], Website: http://www.iis.sinica.edu.tw/~trc/ Office hours: Monday 4:50 – 6:40 pm at Room 716 in Management Building I. Office phone numbers: 02-3366-1200 ext. 20 (Monday 4:50 – 6:40 pm only), 02-2788-3799 ext. 1608 (Academia Sinica). Teaching Assistant 5=† Chin-Lung Chang E–mail: [email protected], Tel: (02) 2788 3799 ext. 1663 (Academia Sinica). Schedule (Part 1) 09/17 Document markup languages; HTML and SVG examples; Why XML? (Part 1) 09/24 XML Recommendation — DTD (Document Type Definition); Case study: RSS 0.91, QAML, XBEL. (Part 1) 10/01 XML Recommendation — language code, character set, text encoding; XML Namespace. (Part 1) 10/08 (Guest Lecturer: Yi-Hong Chang 555ÝÝÝ;;;) CSS (Cascading Style Sheets). (Part 2) 10/15 XPath (XML Path Language). (Part 2) 10/22 XSLT (Extensible Stylesheet Language Transformations). (Part 2) 10/29 DOM (Document Object Model). (Part 2) 11/05 XML in Web programming: Server-side/client-side XSLT, client-side ECMAscripting, etc. (Part 3) 11/12 XML DTD revisited; Case study: Ruby, XBEL, XHTML Modulation. (Part 3) 11/19 (Guest Lecturer: Chin-Lung Chang 555===†††) XML Schema — Structures & Datatypes. (Part 3) 11/26 Relax NG; Case study: Atom. (Part 3) 12/03 Comparisons of XML schema languages; issues in XML document validation. (Part 3) 12/10 RDF (Resource Description Framework). (Part 4) 12/17 (Project proposal due) Balanced context-free grammars. (Part 4) 12/24 Regular tree grammars; tree automata. (Part 4) 12/31 Validated XML processors; Case study: CDuce. (Part 4) 12/07 XML research topics. (Part 5) 01/14 (Project presentation) Reference Book (Recommended) Elliotte Rusty Harold & W. Scott Means. XML in a Nutshell, 3rd edition. Sept. 2004, O’Reilly Media, Inc. Grade Homework (60%) and project (40%). 1.
Recommended publications
  • JSON Application Programming Interface for Discrete Event Simulation Data Exchange
    JSON Application Programming Interface for Discrete Event Simulation data exchange Ioannis Papagiannopoulos Enterprise Research Centre Faculty of Science and Engineering Design and Manufacturing Technology University of Limerick Submitted to the University of Limerick for the degree of Master of Engineering 2015 1. Supervisor: Prof. Cathal Heavey Enterprise Research Centre University of Limerick Ireland ii Abstract This research is conducted as part of a project that has the overall aim to develop an open source discrete event simulation (DES) platform that is expandable, and modular aiming to support the use of DES at multi-levels of manufacturing com- panies. The current work focuses on DES data exchange within this platform. The goal of this thesis is to develop a DES exchange interface between three different modules: (i) ManPy an open source discrete event simulation engine developed in Python on the SimPy library; (ii) A Knowledge Extraction (KE) tool used to populate the ManPy simulation engine from shop-floor data stored within an Enterprise Requirements Planning (ERP) or a Manufacturing Execution System (MES) to allow the potential for real-time simulation. The development of the tool is based on R scripting language, and different Python libraries; (iii) A Graphical User Interface (GUI) developed in JavaScript used to provide an interface in a similar manner to Commercial off-the-shelf (COTS) DES tools. In the literature review the main standards that could be used are reviewed. Based on this review and the requirements above, the data exchange format standard JavaScript Object Notation (JSON) was selected. The proposed solution accom- plishes interoperability between different modules using an open source, expand- able, and easy to adopt and maintain, in an all inclusive JSON file.
    [Show full text]
  • Standardisation and Organisation of Clinical Data and Disease Mechanisms for Comparison Over Heterogeneous Systems in the Context of Neurodegenerative Diseases
    PhD-FSTC-2018-51 The Faculty of Sciences, Technology and Communication DISSERTATION Defence held on 03/07/2018 in Luxembourg to obtain the degree of DOCTEUR DE L’UNIVERSITÉ DU LUXEMBOURG EN BIOLOGIE ​ by AISHWARYA ALEX NAMASIVAYAM Born on 12 November 1987 in Bharananganam (India) STANDARDISATION AND ORGANISATION OF CLINICAL DATA AND DISEASE MECHANISMS FOR COMPARISON OVER HETEROGENEOUS SYSTEMS IN THE CONTEXT OF NEURODEGENERATIVE DISEASES Dissertation defence committee Prof. Dr Reinhard Schneider, dissertation supervisor ​ ​ Professor, Université du Luxembourg Dr Inna Kuperstein Researcher and scientific coordinator, Institut Curie, Paris Dr Enrico Glaab, Chairman ​ ​ Senior research scientist, Université du Luxembourg Prof. Dr Karsten Hiller Professor, Technische Universität Braunschweig Dr Marek Ostaszewski, Vice Chairman ​ ​ Research associate, Université du Luxembourg Affidavit I hereby confirm that the PhD thesis entitled "Standardisation and Organisation of Clinical Data and Disease Mechanisms for Comparison Over Heterogeneous Systems in the Context of Neurodegenerative Diseases" has been written independently and without any other sources than cited. Luxembourg, July 26, 2018 Aishwarya Alex Namasivayam i ii Acknowledgements First and foremost, I would like to thank Dr. Reinhard Schneider, my supervisor for giving me the opportunity and support to pursue my PhD in the group. Biocore is a very wonderful working environment. I couldnt ask for a better boss! I would like to thank all my colleagues for their support and making this a memorable journey. Special thanks to Marek, Venkata, Wei and Piotr for their valuable suggestions and feedbacks. My sincere gratitude to Dr. Jochen Schneider and Dr. Karsten Hiller for agreeing to be part of the CET committee and the constructive criticism during the PhD.
    [Show full text]
  • Architecture of a P2P Distributed Adaptive Directory
    Architecture of a P2P Distributed Adaptive Directory Gennaro Cordasco Vittorio Scarano Cristiano Vitolo [email protected] [email protected] [email protected] Dipartimento di Informatica e Applicazioni “R.M.Capocelli” Universit`a di Salerno, 84081 Baronissi (SA) – Italy Categories and Subject Descriptors C.2.4 [Distributed Systems]: Distributed Applications; H.3.4 [Systems and Software]: Distributed Systems; H.5.4 [Hypertext/Hypermedia]: Navigation General Terms Design Keywords Adaptivity, Bookmark sharing, Peer to Peer 1. INTRODUCTION Bookmarks are, nowadays, an important aid to navigation since they represent an easy way to reduce the cognitive load of man- aging and typing URLs. All the browsers have always provided, since the very beginning of the WWW, friendly ways of managing bookmarks. In this paper we deal with the problem of enriching this Figure 1: The diagram of the architecture of our system. supportive framework for bookmarks (as provided by the browsers) The middle layer, named DAD, exploits the bottom level to ob- by adding collaboration and (group) adaptation with a P2P system. tain information about bookmarks and users, providing a collabora- In this paper, we describe a system that offers a distributed, co- tive and adaptive system to manage bookmarks. Shared bookmarks operative and adaptive environment for bookmark sharing. DAD are placed in an ontology that is common to all the peers and the (Distributed Adaptive Directory) offers an adaptive environment adaptivity provided by the system is based on a modified applica- since it provides suggestions about the navigation based on (a) the tion of the Kleinberg algorithm to evaluate hub and authorities on a bookmarks, (b) the feedback implicitly provided by users and (c) web structure.
    [Show full text]
  • Dogear: Social Bookmarking in the Enterprise David R Millen, Jonathan Feinberg, Bernard Kerr One Rogers Street
    CHI 2006 Proceedings • Social Computing 1 April 22-27, 2006 • Montréal, Québec, Canada Dogear: Social Bookmarking in the Enterprise David R Millen, Jonathan Feinberg, Bernard Kerr One Rogers Street. Cambridge, MA 02142 {david_r_millen, jdf, bernard_kerr}@us.ibm.com +1-617-693-7490 ABSTRACT Perhaps the most familiar approach to re-finding informa- In this paper we describe a social bookmarking service de- tion on the web has been the use of personal bookmarks, as signed for a large enterprise. We discuss design principles supported by various web browsers. For example, the addressing online identity, privacy, information discovery Mozilla browser supports the creation of collections of (including search and pivot browsing), and service extensi- URLs, which can be annotated using keywords or free-form bility based on a web-friendly architectural style. In addi- text, and then sorted on a variety of dimensions (e.g., time tion we describe the key design features of our implementa- last visited, keyword, location). An early study of book- tion. We provide the results of an eight week field trial of mark use showed that people created bookmarks based on this enterprise social bookmarking service, including a de- the quality of and personal interest in the content, high fre- scription of user activities, based on log file analysis. We quency of current use, and a sense of potential for future share the results of a user survey focused on the benefits of use [1]. Furthermore, the number of bookmarks contained the service. The feedback from the user trial, comprising in an individual collection grew steadily and roughly line- survey results, log file analysis and informal communica- arly, and the use of folders to categorize bookmarks in- tions, is quite positive and suggests several promising en- creased as the size of the collection increased.
    [Show full text]
  • Understanding
    Implementing UBL Mark Crawford UBL Vice Chair XML 2003 9 December 2003 Why Are We Talking About UBL • UBL fulfils the promise of XML for business by defining a standard cross-industry vocabulary • UBL is the ebXML missing link • UBL plus ebXML enables the next generation of eBusiness exchanges – Cheaper, easier, Internet-ready – Extends benefits of EDI to small businesses – Fits existing legal and trade concepts – Allows re-use of data • UBL can provide the XML payload for a wide variety of other web-based business frameworks Overview 1 What and Why of UBL 2 The Design of UBL ebXML Core Components Naming and Design Rules Document Engineering Customizing UBL 3 The Content of UBL 1.0 What is Normative What is non-Normative Availability 4 Making UBL Happen 5 UBL Phase 2 6 Summary The promise of XML for e-business • Plug ‘n’ play electronic commerce – Spontaneous trade – No custom programming • Ubiquity on the Internet – Dirt-cheap tools – Complete platform independence – Enable true global market availability • Enable universal interoperability – Abandon existing EDI systems – Handle both "publication" document types and "transactional" documents Goals for Successful eBusiness Services • Web-enable existing fax- and paper-based business practices • Allow businesses to upgrade at their own pace • Preserve the existing investment in electronic business exchanges • Integrate small and medium-size businesses into existing electronic data exchange-based supply chains The standardization of XML business documents is the easiest way to accomplish
    [Show full text]
  • Χchek RC2 User Manual
    χChek RC2 User Manual Documentation maintained by Jocelyn Simmonds and Arie Gurfinkel [email protected] Department of Computer Science 40 St George Street University of Toronto Toronto, Ontario, Canada M5S 2E4 This document is part of the distribution package of the χChek model checker. Copyright c 2002 – 2007 by University of Toronto Contents 1 Introduction 2 2 Installation 4 3 Input 5 3.1 Models .......................................... ........ 5 3.1.1 SMV ........................................... .... 5 3.1.2 GCLang........................................ ...... 5 3.1.3 XML ........................................... .... 9 3.2 Algebras........................................ .......... 11 3.2.1 Availablealgebras .. .... .... ... .... .... .... .... ........... 11 3.2.2 Encodinganalgebra ............................. .......... 12 3.3 Properties ...................................... ........... 14 4 Guide to the XChek User Interface 16 4.1 Loadingamodel ................................... .......... 17 4.1.1 SMVandGCLangmodels .. .... ... .... .... .... .... ... ........ 18 4.1.2 XMLmodels ..................................... ...... 20 4.2 CTLHistoryfiles .................................. ........... 20 4.3 Counterexamples ................................. ............ 21 4.4 Preferences ..................................... ........... 23 5 Tutorial 27 5.1 ModelChecking................................... ........... 27 5.2 VacuityDetection................................ ............. 29 5.3 QueryChecking..................................
    [Show full text]
  • Soatopicindex - QVIZ
    SOATopicIndex - QVIZ SOATopicIndex From QVIZ Jump to: navigation, search State Of The Art Topic Index (SOTA) Partners should add topics that are relevant also to their work and to also provide other partners with insights or understanding of project technology, standards etc. ● Cross Reference of SOTA documents (Word, powerpoint, etc) SOTA Attachment Cross Reference Contents ● 1 General resources ● 2 Archive and content organization ● 3 Technologies relevant to QVIZ ● 4 Knowledge related (Ontology, thesauri,etc) or standards [edit] General resources 1. Relvant State-of-the-Art Reports 2. Support and Networking 1. Mailing Lists 3. Relevant projects 1. Electronic library project 2. Relevant Projects 4. Issue and Bug Tracking Software Archive and content organization [edit] 1. Archive overview provenance principle 2. Inner organization of Fonds 3. Record Keeping Concept 4. Partner archive systems ( more) http://qviz.humlab.umu.se/index.php/SOATopicIndex (1 of 5)2006-09-29 08:49:10 SOATopicIndex - QVIZ 1. NAE System Description 2. SVAR and National Archives System Description 3. Vision of Britain System Description 4. Comparison of admin unit issues across partners systems 5. Archive features - QVIZ 1. Trackback 6. Archive Standards Technologies relevant to QVIZ [edit] 1. Image annotation to support user generated Thematic maps 2. Web Tools: Screen Capture 3. Prominent digital repositories Technologies and Digital Archive Technologies 1. See also Digital Object Metadata 4. Semantic repositories and other basis repository techologies (also semantic digital or semantic e-Libraries, etc) 5. Semantic web services (SWS) and Service oriented architecture (SOA) 6. Access Stategies 7. Workflow Technologies (includes BEPL,etc related tools) 8. Relevant social software (mainly to point out relevant features) 1.
    [Show full text]
  • Virtuoso Sponger
    Virtuoso Sponger Situation Analysis Recent estimates (circa. March '09) from the Linked Open Data (LOD) community put the size of burgeoning Web of Linked Data at approximately 4.5 billion triples; in reality, this number is much larger due to the fact that the LOD community estimates are basically approximations of the data from the LOD Cloud pictorial which is primarily based on the following Linked Data source types: Loads from public RDF dump archives Loads from dumps released by crawlers such as PingTheSemanticWeb, Sindice, and Falcons What these estimates do not cover is the amount of RDF based Linked Data generated "on the fly" by RDF middleware technologies (aka. RDFizers). Naturally, tracking the count of this form of Web is somewhat mercurial due to the fact that all of the RDFized data isn't necessarily available via a public RDF archive dump. This white paper covers Virtuoso's in-built RDFizer middleware popularly known as the "Sponger"; an example of a very powerful cross platform solution for generating RDF Linked Data "on the fly". The Sponger exposes its service via the Virtuoso SPARQL Engine, Content Crawler, and a RESTful API. Other facets of Virtuoso's Linked Data related feature set are explored in the accompanying white papers Virtuoso RDF Views - Getting Started Guide and Deploying Linked Data . Contents Contents What Is The Sponger? Sponger Benefits Using The Sponger SPARQL Query Processor SPARQL Extensions for IRI Dereferencing of FROM Clauses SPARQL Extensions for IRI Dereferencing of Variables RDF Proxy Service
    [Show full text]
  • Latest Onedrive for Windows Setup
    DataONE Python Products unknown 2019-07-03 CONTENTS 1 Utilities (for end users) 3 1.1 DataONE Command Line Utilities and Examples............................3 1.2 DataONE ONEDrive...........................................3 1.3 DataONE Command Line Interface...................................3 2 Member Node (for Member Node partners)5 2.1 Generic Member Node (GMN).....................................5 3 Python Libraries (for software developers)7 3.1 DataONE Common Library for Python.................................7 3.2 DataONE Client Library for Python...................................7 3.3 DataONE Science Metadata Validator for Python............................7 3.4 DataONE Test Utilities..........................................7 3.5 DataONE Dev Tools...........................................8 3.6 DataONE CSW Harvester........................................8 4 Contents 9 4.1 DataONE Utilities / Examples......................................9 4.2 DataONE ONEDrive........................................... 25 4.3 DataONE Command Line Interface (CLI)................................ 57 4.4 Generic Member Node (GMN)..................................... 85 4.5 Indices and tables............................................ 161 4.6 DataONE Common Library for Python................................. 162 4.7 Indices and tables............................................ 246 4.8 DataONE Client Library for Python................................... 246 4.9 Indices and tables............................................ 284 4.10 DataONE Science
    [Show full text]
  • A Study of Extensible Markup Language (XML)
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1-1-2001 A study of eXtensible Markup Language (XML) Qin Liu Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Recommended Citation Liu, Qin, "A study of eXtensible Markup Language (XML)" (2001). Retrospective Theses and Dissertations. 21427. https://lib.dr.iastate.edu/rtd/21427 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. A study of extensible Markup Language (XML) by Qin Liu A thesis submitted to the graduate faculty in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Major: Business Major Professor: Sree Nilakanta Iowa State University Ames, Iowa 2001 ii Graduate College Iowa State University This is to certify that the Master's thesis of Qin Liu has met the thesis requirements of Iowa State University Signatures have been redacted for privacy iii TABLE OF CONTENTS CHAPTER 1 INTRODUCTION 1 History Of XML 2 Basic XML 5 Benefits Of XML 9 CHAPTER 2 IMPACTS AND USES 15 Impacts 15 Uses 19 CHAPTER 3 THE SURVEY STUDY 22 Sample Selection 22 Data Collection 23 Benefits Analysis 24 Statistics Analysis 26 Industry Segment Analysis 30 CHAPTER 4 DISCUSSION
    [Show full text]
  • See Also... a Library Weblog by Steve Lawson
    What the world needs now is another library weblog like I need a hole in my head August 17, 2005 At least that's what I thought after reading over Walt Crawford's investigation of the "biblioblogosphere" in the September Cites & Insights (pdf link). I knew there were a lot of library weblogs--I subscribe to close to thirty in my RSS reader--but Crawford analyzed over 230 library-related blogs, and ran down sixty of the blogs with the widest reach (according to his interpretation of his data; if you'd like to interpret it yourself, he's made it available as an Excel spreadsheet). The likelihood of my adding something entirely new to that mix seems low. And yet, here we are. This is the first real post to my new library weblog, See Also. While I am obviously feeling a little self-conscious about adding my blog to those 230+ that are already out there, I'm also very excited about the idea of thinking, writing, and conversing about libraries, librarianship, and more. I have had other blogs in the past, but most of those were just my outboard brain, a parking place for URLs I didn't want to lose (now I just use del.icio.us). I made no attempt to build a readership, to communicate, to connect. But over the last year as I read more and more library weblogs as they hashed out Gormangate and tried to understand the implications of tags and folksonomies, I wanted to become part of the conversation. I left the occasional comment, but it seemed like I needed a home base if I was really going to take part.
    [Show full text]
  • Research Paper Publ. Schedule
    Published November 1, 2002. Distribution restricted to Sponsors until February 1, 2003. white paper Survey of Languages, Specifications and Standards for Database and Network Communication David L. Brock auto-id center massachusetts institute of technology, 77 massachusetts avenue, bldg 3-449, cambridge, ma 02139-4307, usa abstract This paper presents a brief survey of the many specifications, languages and standards for database and network communication. Based largely on markup meta-languages, such as the Standard General Markup Language (SGML) and the eXtensible Markup Language (XML), these efforts cover a wide range of industries, applications and functions. All of the projects listed here are presented in more detail elsewhere. The intention of this paper, however, is to provide a simple survey and quick enumeration of the many, varied efforts from across industry. Finally, the diversity and commonality of these initiatives suggesta comprehensive, unified language. Published November 1, 2002. Distribution restricted to Sponsors until February 1, 2003. white paper Survey of Languages, Specifications and Standards for Database and Network Communication Biography David L. Brock Co-Director Dr. David Brock received Bachelors degrees in theoretical mathematics and mechanical engineering from MIT, and his Masters and Ph.D. degrees from the Department of Mechanical Engineering at MIT with an affiliation to the Artificial Intelligence Lab. He is currently a Principal Research Scientist in the Laboratory for Manufacturing and Productivity and Co-Director of the MIT Auto-ID Center. Dr. Brock is also the Founder of Brock Rogers Surgical, a manufacturer of robotic medical devices. Dr. Brock has worked with a number of organizations including the Artificial Intelligence Laboratory, the Massachusetts Eye and Ear Infirmary, DARPA, Lockheed-Martin, Loral, BBN and Draper Laboratories.
    [Show full text]