Pathfinder: Xquery Compilation for Relational Database Targets

Total Page:16

File Type:pdf, Size:1020Kb

Pathfinder: Xquery Compilation for Relational Database Targets Technische Universit¨atM¨unchen d d d d Institut f¨urInformatik d d d d d d d d d Lehrstuhl f¨urDatenbanksysteme d d d d d d d Pathfinder: XQuery Compilation Techniques for Relational Database Targets Jens Thilo Teubner Vollst¨andiger Abdruck der von der Fakult¨atf¨ur Informatik der Technischen Uni- versit¨at M¨unchen zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. Helmut Seidl Pr¨uferder Dissertation: 1. Univ.-Prof. Dr. Torsten Grust 2. Prof. Dr. Martin L. Kersten, Universiteit van Amsterdam/Niederlande Die Dissertation wurde am 10. Mai 2006 bei der Technischen Universit¨atM¨unchen eingereicht und durch die Fakult¨atf¨urInformatik am 28. September 2006 angenom- men. ii This thesis is also available in print (ISBN 3-89963-440-3). iii pathfinder ("pA:T­faInd@) n. a person who makes or finds a way, esp. through unexplored areas or fields of knowledge. Collins English Dictionary [Makins95] iv Abstract Even after at least a decade of work on XML and semi-structured information, such data is still predominantly processed in main-memory, which obviously leads to significant constraints for growing XML document sizes. On the other hand, mature database technologies are readily available to handle vast amounts of data easily. The most efficient ones, relational database manage- ment systems (RDBMSs), however, are largely locked-in to the processing of very regular, table-shaped data only. In this thesis, we will unleash the power of re- lational database technology to the domain of semi-structured data, the world of XML. We will present a purely relational XQuery processor that handles huge amounts of XML data in an efficient and scalable manner. Our setup is based on a relational tree encoding. The XPath accelerator, also known as pre/post numbering, has since become a widely accepted means to effi- ciently store XML data in a relational system. Yet, it has turned out that there is additional performance to gain. A close look into the encoding itself and the deliberate choice of relational indexes will give intriguing insights into the efficient access to node-based tree encodings. The backbone of XML query processing, the access to XML document regions in terms of XPath tree navigation, becomes particularly efficient if the database system is equipped with enhanced, tree-aware algorithms. We will devise staircase join, a novel join operator that encapsulates knowledge on the underlying tree en- coding to provide efficient support for XPath navigation primitives. The required changes to the DBMS kernel remain remarkably small: staircase join integrates well with existing features of relational systems and comes at the cost of adding a single join operator only. The impact on performance, however, is significant: we observed speedups of several orders of magnitude on large-scale XML instances. Although existing XQuery processors make use of the resulting XPath perfor- mance gain by evaluating XPath navigation steps in the DBMS back-end, they tend to perform other core XQuery operations outside the relational database ker- nel. Most notably this affects the FLWOR iteration primitive, XQuery’s node con- struction facilities, and the dynamic type semantics of XQuery. The loop-lifting technique we present in this work deals with these aspects in a purely relational fashion. The outcome is a loop-lifting compiler that translates arbitrary XQuery v vi ABSTRACT expressions into a single relational query plan. Generated plans take particular advantage of the operations that relational databases know how to perform best: relational joins as well as the computation of aggregates. The MonetDB/XQuery system to which this thesis has contributed provides the experimental proof of the effectiveness of our approach. MonetDB/XQuery is one of the fastest and most scalable XQuery engines available today and han- dles queries in the multi-gigabyte range in interactive time. The key to this per- formance are the techniques described in this thesis. They form the basis for MonetDB/XQuery’s core component: the XQuery compiler Pathfinder. Contents Abstract v 1 Introduction 1 1.1 Database Technology for XML ..................... 2 1.1.1 Native XML Databases ..................... 2 1.1.2 Relational Back-Ends ...................... 3 1.2 Contributions of this Thesis ...................... 6 2 Relational XML Storage 11 2.1 XPath Accelerator Encoding ...................... 11 2.1.1 Pre- and Postorder Ranks ................... 12 2.1.2 XPath Axis Conditions ..................... 13 2.1.3 Illustrating XPath Accelerator: The pre/post Plane ..... 14 2.1.4 SQL-Based XPath Evaluation ................. 14 2.1.5 Index Support for XPath Accelerator ............. 16 2.1.6 Techniques to Reduce the Search Space ............ 17 2.1.7 Range Encoding: An Alternative to pre/post ......... 22 2.1.8 A Word on Updates ...................... 23 2.2 XPath on Commodity RDBMSs .................... 23 2.2.1 DB2 Runs XPath ........................ 25 2.2.2 XPath Accelerator on PostgreSQL .............. 31 2.3 Related Work .............................. 32 2.3.1 Fixed-Length Encodings .................... 33 2.3.2 Variable-Length Encodings ................... 35 2.3.3 Relational Database Support .................. 36 3 XPath Evaluation with Staircase Join 39 3.1 Re-Inspecting XPath Accelerator ................... 39 3.1.1 Node Distribution in the pre/post Plane ........... 40 3.2 Staircase Join .............................. 41 3.2.1 Pruning ............................. 41 vii viii CONTENTS 3.2.2 Empty Regions in the pre/post Plane ............. 44 3.2.3 Partitioning ........................... 46 3.2.4 A Further Increase of Tree Awareness: Skipping ....... 49 3.3 Implementation Considerations .................... 51 3.3.1 A Disk-Based ¢ Implementation ............... 51 3.3.2 Main Memory-Related Adaptions ............... 56 3.4 Tree Awareness Beyond Staircase Join ................ 62 3.4.1 Loop-Lifting Staircase Join ................... 62 3.4.2 Support for Non-Recursive Axes ................ 65 3.4.3 Staircase Join Without Staircase Join ............. 67 3.4.4 Tree Awareness in Other Domains ............... 68 3.5 Related Work .............................. 69 3.5.1 Path Evaluation on RDBMSs ................. 69 3.5.2 Tree Properties in XPath .................... 70 4 Loop-Lifting: From XPath to XQuery 73 4.1 A Relational Algebra for XQuery ................... 74 4.1.1 Relational Sequence Encoding ................. 74 4.1.2 An Algebra for XQuery ..................... 77 4.1.3 A Ruleset to Compile XQuery ................. 79 4.1.4 Basic XQuery Expressions ................... 80 4.1.5 Sequence Construction ..................... 81 4.2 Relational FLWORs ............................ 82 4.2.1 for-Bound Variables ...................... 83 4.2.2 Maintaining loop ........................ 85 4.2.3 Free Variables in the return Clause .............. 85 4.2.4 Mapping Back .......................... 87 4.2.5 Complete Compilation Rule for FLWOR Expressions ..... 88 4.2.6 Optional: The order by Clause ................ 89 4.3 Other Expression Types ........................ 90 4.3.1 Arithmetics/Comparisons ................... 90 4.3.2 Conditionals: if-then-else .................. 91 4.4 Interfacing with XML/XPath ..................... 93 4.4.1 XPath Location Steps ..................... 93 4.4.2 Element Construction ..................... 98 4.4.3 A Note on Side-Effects ..................... 100 4.5 Support for Dynamic Type Tests ................... 102 4.5.1 XQuery Subtype Semantics .................. 102 4.5.2 Sequence Type Matching on Relational Back-Ends ..... 103 4.6 XQuery on DB2 ............................. 107 4.6.1 A Loop-Lifted XQuery-to-SQL Translation .......... 107 CONTENTS ix 4.6.2 XPath Bundling and Use of OLAP Functionality ...... 108 4.6.3 Live Node Sets: Compile-Time Information for Accelerated Query Evaluation ........................ 109 4.6.4 XMark on DB2 ......................... 111 4.7 Wrap-Up ................................. 112 4.7.1 Related Research ........................ 113 4.7.2 Outlook & Perspective ..................... 115 5 The Pathfinder XQuery Compiler 119 5.1 Logical Optimizations in Pathfinder .................. 120 5.1.1 DAGs for Loop-Lifted Query Plans .............. 120 5.1.2 A Peephole-Style Plan Analysis ................ 120 5.1.3 Robust XQuery Join Detection ................ 124 5.2 The Importance of Order ........................ 127 5.2.1 Order in Loop-Lifted XQuery ................. 127 5.2.2 Order Indifference in XQuery ................. 128 5.2.3 A Performance Advantage can be Realized .......... 131 5.2.4 Physical Optimization and Order Awareness ......... 131 5.3 Cardinality Forecasts for Loop-Lifted Plans .............. 133 5.3.1 Statistical Guide ........................ 134 5.3.2 Cardinality Forecasts ...................... 135 5.4 MonetDB/XQuery ........................... 137 5.4.1 System Architecture ...................... 138 5.4.2 Overall Query Performance .................. 139 5.4.3 Order Awareness in Pathfinder ................ 140 5.4.4 Scalability with Respect to Data Volumes .......... 141 5.4.5 XQuery on High Data Volumes ................ 142 5.5 Research in the Neighborhood ..................... 142 5.5.1 Algebraic Optimization for XQuery .............. 143 5.5.2 Order Awareness ........................ 144 5.5.3 XQuery Cardinality Forecasts ................. 145 5.5.4 Further Optimization Hooks .................. 145 6 Wrap-Up 147 6.1 Summary ...............................
Recommended publications
  • A Comparison of On-Line Computer Science Citation Databases
    A Comparison of On-Line Computer Science Citation Databases Vaclav Petricek1,IngemarJ.Cox1,HuiHan2, Isaac G. Councill3, and C. Lee Giles3 1 University College London, WC1E 6BT, Gower Street, London, United Kingdom [email protected], [email protected] 2 Yahoo! Inc., 701 First Avenue, Sunnyvale, CA, 94089 [email protected] 3 The School of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802, USA [email protected] [email protected] Abstract. This paper examines the difference and similarities between the two on-line computer science citation databases DBLP and CiteSeer. The database entries in DBLP are inserted manually while the CiteSeer entries are obtained autonomously via a crawl of the Web and automatic processing of user submissions. CiteSeer’s autonomous citation database can be considered a form of self-selected on-line survey. It is important to understand the limitations of such databases, particularly when citation information is used to assess the performance of authors, institutions and funding bodies. We show that the CiteSeer database contains considerably fewer single author papers. This bias can be modeled by an exponential process with intuitive explanation. The model permits us to predict that the DBLP database covers approximately 24% of the entire literature of Computer Science. CiteSeer is also biased against low-cited papers. Despite their difference, both databases exhibit similar and signif- icantly different citation distributions compared with previous analysis of the Physics community. In both databases, we also observe that the number of authors per paper has been increasing over time. 1 Introduction Several public1 databases of research papers became available due to the advent of the Web [1,22,5,3,2,4,8] These databases collect papers in different scientific disciplines, index them and annotate them with additional metadata.
    [Show full text]
  • Linking Educational Resources on Data Science
    The Thirty-First AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI-19) Linking Educational Resources on Data Science Jose´ Luis Ambite,1 Jonathan Gordon,2∗ Lily Fierro,1 Gully Burns,1 Joel Mathew1 1USC Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, California 90292, USA 2Department of Computer Science, Vassar College, 124 Raymond Ave, Poughkeepsie, New York 12604, USA [email protected], [email protected], flfierro, burns, [email protected] Abstract linked data (Heath and Bizer 2011) with references to well- known entities in DBpedia (Auer et al. 2007), DBLP (Ley The availability of massive datasets in genetics, neuroimag- 2002), and ORCID (orcid.org). Here, we detail the meth- ing, mobile health, and other subfields of biology and ods applied to specific problems in building ERuDIte. First, medicine promises new insights but also poses significant we provide an overview of our project. In later sections, we challenges. To realize the potential of big data in biomedicine, the National Institutes of Health launched the Big Data to describe our resource collection effort, the resource schema, Knowledge (BD2K) initiative, funding several centers of ex- the topic ontology, our linkage approach, and the linked data cellence in biomedical data analysis and a Training Coordi- resource provided. nating Center (TCC) tasked with facilitating online and in- The National Institutes of Health (NIH) launched the Big person training of biomedical researchers in data science. A Data to Knowledge (BD2K) initiative to fulfill the promise major initiative of the BD2K TCC is to automatically identify, of biomedical “big data” (Ohno-Machado 2014). The NIH describe, and organize data science training resources avail- BD2K program funded 15 major centers to investigate how able on the Web and provide personalized training paths for data science can benefit diverse fields of biomedical research users.
    [Show full text]
  • Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing
    RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing A collection of attributes and processing rules for extending XHTML to support RDF W3C Recommendation 14 October 2008 This version: http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014 Latest version: http://www.w3.org/TR/rdfa-syntax Previous version: http://www.w3.org/TR/2008/PR-rdfa-syntax-20080904 Diff from previous version: rdfa-syntax-diff.html Editors: Ben Adida, Creative Commons [email protected] Mark Birbeck, webBackplane [email protected] Shane McCarron, Applied Testing and Technology, Inc. [email protected] Steven Pemberton, CWI Please refer to the errata for this document, which may include some normative corrections. This document is also available in these non-normative formats: PostScript version, PDF version, ZIP archive, and Gzip’d TAR archive. The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007-2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported - 1 - How to Read this Document RDFa in XHTML: Syntax and Processing into a user’s desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo’s creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.
    [Show full text]
  • Bibliography of Erik Wilde
    dretbiblio dretbiblio Erik Wilde's Bibliography References [1] AFIPS Fall Joint Computer Conference, San Francisco, California, December 1968. [2] Seventeenth IEEE Conference on Computer Communication Networks, Washington, D.C., 1978. [3] ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Los Angeles, Cal- ifornia, March 1982. ACM Press. [4] First Conference on Computer-Supported Cooperative Work, 1986. [5] 1987 ACM Conference on Hypertext, Chapel Hill, North Carolina, November 1987. ACM Press. [6] 18th IEEE International Symposium on Fault-Tolerant Computing, Tokyo, Japan, 1988. IEEE Computer Society Press. [7] Conference on Computer-Supported Cooperative Work, Portland, Oregon, 1988. ACM Press. [8] Conference on Office Information Systems, Palo Alto, California, March 1988. [9] 1989 ACM Conference on Hypertext, Pittsburgh, Pennsylvania, November 1989. ACM Press. [10] UNIX | The Legend Evolves. Summer 1990 UKUUG Conference, Buntingford, UK, 1990. UKUUG. [11] Fourth ACM Symposium on User Interface Software and Technology, Hilton Head, South Carolina, November 1991. [12] GLOBECOM'91 Conference, Phoenix, Arizona, 1991. IEEE Computer Society Press. [13] IEEE INFOCOM '91 Conference on Computer Communications, Bal Harbour, Florida, 1991. IEEE Computer Society Press. [14] IEEE International Conference on Communications, Denver, Colorado, June 1991. [15] International Workshop on CSCW, Berlin, Germany, April 1991. [16] Third ACM Conference on Hypertext, San Antonio, Texas, December 1991. ACM Press. [17] 11th Symposium on Reliable Distributed Systems, Houston, Texas, 1992. IEEE Computer Society Press. [18] 3rd Joint European Networking Conference, Innsbruck, Austria, May 1992. [19] Fourth ACM Conference on Hypertext, Milano, Italy, November 1992. ACM Press. [20] GLOBECOM'92 Conference, Orlando, Florida, December 1992. IEEE Computer Society Press. http://github.com/dret/biblio (August 29, 2018) 1 dretbiblio [21] IEEE INFOCOM '92 Conference on Computer Communications, Florence, Italy, 1992.
    [Show full text]
  • RDF/XML: RDF Data on the Web
    Developing Ontologies • have an idea of the required concepts and relationships (ER, UML, ...), • generate a (draft) n3 or RDF/XML instance, • write a separate file for the metadata, • load it into Jena with activating a reasoner. • If the reasoner complains about an inconsistent ontology, check the metadata file alone. If this is consistent, and it complains only when also data is loaded: – it may be due to populating a class whose definition is inconsistent and that thus must be empty. – often it is due to wrong datatypes. Recall that datatype specification is not interpreted as a constraint (that is violated for a given value), but as additional knowledge. 220 Chapter 6 RDF/XML: RDF Data on the Web • An XML representation of RDF data for providing RDF data on the Web could be done straightforwardly as a “holds” relation mapped according to SQLX (see ⇒ next slide). • would be highly redundant and very different from an XML representation of the same data • search for a more similar way: leads to “striped XML/RDF” – data feels like XML: can be queried by XPath/Query and transformed by XSLT – can be parsed into an RDF graph. • usually: provide RDF/XML data to an agreed RDFS/OWL ontology. 221 A STRAIGHTFORWARD XML REPRESENTATION OF RDF DATA Note: this is not RDF/XML, but just some possible representation. • RDF data are triples, • their components are either URIs or literals (of XML Schema datatypes), • straightforward XML markup in SQLX style, • since N3 has a term structure, it is easy to find an XML markup. <my-n3:rdf-graph xmlns:my-n3="http://simple-silly-rdf-xml.de#"> <my-n3:triple> <my-n3:subject type="uri">foo://bar/persons/john</my-n3:subject> <my-n3:predicate type="uri">foo://bar/meta#name</my-n3:predicate> <my-n3:object type="http://www.w3.org/2001/XMLSchema#string">John</my-n3:object> </my-n3 triple> <my-n3:triple> ..
    [Show full text]
  • Analysis of Computer Science Communities Based on DBLP
    Analysis of Computer Science Communities Based on DBLP Maria Biryukov1 and Cailing Dong1,2 1 Faculty of Science, Technology and Communications, MINE group, University of Luxembourg, Luxembourg [email protected] 2 School of Computer Science and Technology, Shandong University, Jinan, 250101, China cailing [email protected] Abstract. It is popular nowadays to bring techniques from bibliometrics and scientometrics into the world of digital libraries to explore mecha- nisms which underlie community development. In this paper we use the DBLP data to investigate the author’s scientific career, and analyze some of the computer science communities. We compare them in terms of pro- ductivity and population stability, and use these features to compare the sets of top-ranked conferences with their lower ranked counterparts.1 Keywords: bibliographic databases, author profiling, scientific commu- nities, bibliometrics. 1 Introduction Computer science is a broad and constantly growing field. It comprises various subareas each of which has its own specialization and characteristic features. While different in size and granularity, research areas and conferences can be thought of as scientific communities that bring together specialists sharing simi- lar interests. What is specific about conferences is that in addition to scope, par- ticipating scientists and regularity, they are also characterized by level. In each area there is a certain number of commonly agreed upon top ranked venues, and many others – with the lower rank or unranked. In this work we aim at finding out how the communities represented by different research fields and conferences are evolving and communicating to each other. To answer this question we sur- vey the development of the author career, compare various research areas to each other, and finally, try to identify features that would allow to distinguish between venues of different rank.
    [Show full text]
  • Relational Algebra
    Relational Algebra Instructor: Shel Finkelstein Reference: A First Course in Database Systems, 3rd edition, Chapter 2.4 – 2.6, plus Query Execution Plans Important Notices • Midterm with Answers has been posted on Piazza. – Midterm will be/was reviewed briefly in class on Wednesday, Nov 8. – Grades were posted on Canvas on Monday, Nov 13. • Median was 83; no curve. – Exam will be returned in class on Nov 13 and Nov 15. • Please send email if you want “cheat sheet” back. • Lab3 assignment was posted on Sunday, Nov 5, and is due by Sunday, Nov 19, 11:59pm. – Lab3 has lots of parts (some hard), and is worth 13 points. – Please attend Labs to get help with Lab3. What is a Data Model? • A data model is a mathematical formalism that consists of three parts: 1. A notation for describing and representing data (structure of the data) 2. A set of operations for manipulating data. 3. A set of constraints on the data. • What is the associated query language for the relational data model? Two Query Languages • Codd proposed two different query languages for the relational data model. – Relational Algebra • Queries are expressed as a sequence of operations on relations. • Procedural language. – Relational Calculus • Queries are expressed as formulas of first-order logic. • Declarative language. • Codd’s Theorem: The Relational Algebra query language has the same expressive power as the Relational Calculus query language. Procedural vs. Declarative Languages • Procedural program – The program is specified as a sequence of operations to obtain the desired the outcome. I.e., how the outcome is to be obtained.
    [Show full text]
  • XML for Java Developers G22.3033-002 Course Roadmap
    XML for Java Developers G22.3033-002 Session 1 - Main Theme Markup Language Technologies (Part I) Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences 1 Course Roadmap Consider the Spectrum of Applications Architectures Distributed vs. Decentralized Apps + Thick vs. Thin Clients J2EE for eCommerce vs. J2EE/Web Services, JXTA, etc. Learn Specific XML/Java “Patterns” Used for Data/Content Presentation, Data Exchange, and Application Configuration Cover XML/Java Technologies According to their Use in the Various Phases of the Application Development Lifecycle (i.e., Discovery, Design, Development, Deployment, Administration) e.g., Modeling, Configuration Management, Processing, Rendering, Querying, Secure Messaging, etc. Develop XML Applications as Assemblies of Reusable XML- Based Services (Applications of XML + Java Applications) 2 1 Agenda XML Generics Course Logistics, Structure and Objectives History of Meta-Markup Languages XML Applications: Markup Languages XML Information Modeling Applications XML-Based Architectures XML and Java XML Development Tools Summary Class Project Readings Assignment #1a 3 Part I Introduction 4 2 XML Generics XML means eXtensible Markup Language XML expresses the structure of information (i.e., document content) separately from its presentation XSL style sheets are used to convert documents to a presentation format that can be processed by a target presentation device (e.g., HTML in the case of legacy browsers) Need a
    [Show full text]
  • How Do Computer Scientists Use Google Scholar?: a Survey of User Interest in Elements on Serps and Author Profile Pages
    BIR 2019 Workshop on Bibliometric-enhanced Information Retrieval How do Computer Scientists Use Google Scholar?: A Survey of User Interest in Elements on SERPs and Author Profile Pages Jaewon Kim, Johanne R. Trippas, Mark Sanderson, Zhifeng Bao, and W. Bruce Croft RMIT University, Melbourne, Australia fjaewon.kim, johanne.trippas, mark.sanderson, zhifeng.bao, [email protected] Abstract. In this paper, we explore user interest in elements on Google Scholar's search engine result pages (SERPs) and author profile pages (APPs) through a survey in order to predict search behavior. We inves- tigate the effects of different query intents (keyword and title) on SERPs and research area familiarities (familiar and less-familiar) on SERPs and APPs. Our findings show that user interest is affected by the re- spondents' research area familiarity, whereas there is no effect due to the different query intents, and that users tend to distribute different levels of interest to elements on SERPs and APPs. We believe that this study provides the basis for understanding search behavior in using Google Scholar. Keywords: Academic search · Google Scholar · Survey study 1 Introduction Academic search engines (ASEs) such as Google Scholar1 and Microsoft Aca- demic2, which are specialized for searching scholarly literature, are now com- monly used to find relevant papers. Among the ASEs, Google Scholar covers about 80-90% of the publications in the world [7] and also provides individual profiles with three types of citation namely, total citation number, h-index, and h10-index. Although the Google Scholar website3 states that the ranking is de- cided by authors, publishers, numbers and times of citation, and even considers the full text of each document, there is a phenomenon called the Google Scholar effect [11].
    [Show full text]
  • Odata-Csdl-Xml-V4.01-Os.Pdf
    OData Common Schema Definition Language (CSDL) XML Representation Version 4.01 OASIS Standard 11 May 2020 This stage: https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/os/odata-csdl-xml-v4.01-os.docx (Authoritative) https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/os/odata-csdl-xml-v4.01-os.html https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/os/odata-csdl-xml-v4.01-os.pdf Previous stage: https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/cos01/odata-csdl-xml-v4.01-cos01.docx (Authoritative) https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/cos01/odata-csdl-xml-v4.01-cos01.html https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/cos01/odata-csdl-xml-v4.01-cos01.pdf Latest stage: https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/odata-csdl-xml-v4.01.docx (Authoritative) https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/odata-csdl-xml-v4.01.html https://docs.oasis-open.org/odata/odata-csdl-xml/v4.01/odata-csdl-xml-v4.01.pdf Technical Committee: OASIS Open Data Protocol (OData) TC Chairs: Ralf Handl ([email protected]), SAP SE Michael Pizzo ([email protected]), Microsoft Editors: Michael Pizzo ([email protected]), Microsoft Ralf Handl ([email protected]), SAP SE Martin Zurmuehl ([email protected]), SAP SE Additional artifacts: This prose specification is one component of a Work Product that also includes: • XML schemas: OData EDMX XML Schema and OData EDM XML Schema.
    [Show full text]
  • PP-DBLP: Modeling and Generating Public-Private Attributed Networks with DBLP
    PP-DBLP: Modeling and Generating Public-Private Attributed Networks with DBLP Xin Huang∗, Jiaxin Jiang∗, Byron Choi∗, Jianliang Xu∗, Zhiwei Zhang∗, Yunya Song# ∗Department of Computer Science, Hong Kong Baptist University #Department of Journalism, Hong Kong Baptist University ∗fxinhuang, jxjian, bchoi, xujl, [email protected], #[email protected] Abstract—In many online social networks (e.g., Facebook, graph is visible only to the corresponding user. From each Google+, Twitter, and Instagram), users prefer to hide her/his users viewpoint, the social network is exactly the union of the partial or all relationships, which makes such private relation- public graph and her/his own private graph. Several sketching ships not visible to public users or even friends. This leads to a new graph model called public-private networks, where each and sampling approaches [2] have been proposed to address user has her/his own perspective of the network including the essential problems of graph processing, such as the size of private connections. Recently, public-private network analysis has reachability tree [5], all-pair shortest paths [6], pairwise node attracted significant research interest in the literature. A great similarities [7], correlation clustering [8] and so on. deal of important graph computing problems (e.g., shortest paths, centrality, PageRank, and reachability tree) has been studied. In social networks, vertices usually contain attributes, e.g., However, due to the limited data sources and privacy concerns, a person has information including name, interests, skills, proposed approaches are not tested on real-world datasets, but and so on. Recent study [2] focuses on one essential aspect on synthetic datasets by randomly selecting vertices as private of topological structure of public-private graphs.
    [Show full text]
  • XML: Looking at the Forest Instead of the Trees Guy Lapalme Professor Département D©Informatique Et De Recherche Opérationnelle Université De Montréal
    XML: Looking at the Forest Instead of the Trees Guy Lapalme Professor Département d©informatique et de recherche opérationnelle Université de Montréal C.P. 6128, Succ. Centre-Ville Montréal, Québec Canada H3C 3J7 [email protected] http://www.iro.umontreal.ca/~lapalme/ForestInsteadOfTheTrees/ Publication date April 14, 2019 XML to PDF by RenderX XEP XSL-FO Formatter, visit us at http://www.renderx.com/ XML: Looking at the Forest Instead of the Trees Guy Lapalme Professor Département d©informatique et de recherche opérationnelle Université de Montréal C.P. 6128, Succ. Centre-Ville Montréal, Québec Canada H3C 3J7 [email protected] http://www.iro.umontreal.ca/~lapalme/ForestInsteadOfTheTrees/ Publication date April 14, 2019 Abstract This tutorial gives a high-level overview of the main principles underlying some XML technologies: DTD, XML Schema, RELAX NG, Schematron, XPath, XSL stylesheets, Formatting Objects, DOM, SAX and StAX models of processing. They are presented from the point of view of the computer scientist, without the hype too often associated with them. We do not give a detailed description but we focus on the relations between the main ideas of XML and other computer language technologies. A single compact pretty-print example is used throughout the text to illustrate the processing of an XML structure with XML technologies or with Java programs. We also show how to create an XML document by programming in Java, in Ruby, in Python, in PHP, in E4X (Ecmascript for XML) and in Swift. The source code of the example XML ®les and the programs are available either at the companion web site of this document or by clicking on the ®le name within brackets at the start of the caption of each example.
    [Show full text]