XML Normal Form (XNF)

Total Page:16

File Type:pdf, Size:1020Kb

XML Normal Form (XNF) Ryan Marcotte www.cs.uregina.ca/~marcottr CS 475 (Advanced Topics in Databases) March 14, 2011 Outline Introduction to XNF and motivation for its creation Analysis of XNF’s link to BCNF Algorithm for converting a DTD to XNF Example March 14, 2011 Ryan Marcotte 2 March 14, 2011 Ryan Marcotte 3 Introduction XML is used for data storage and exchange Data is stored in a hierarchical fashion Duplicates and inconsistencies may exist in the data store March 14, 2011 Ryan Marcotte 4 Introduction Relational databases store data according to some schema XML also stores data according to some schema, such as a Document Type Definition (DTD) Obviously, some schemas are better than others A normal form is needed that reduces the amount of storage needed while ensuring consistency and eliminating redundancy March 14, 2011 Ryan Marcotte 5 Introduction XNF was proposed by Marcelo Arenas and Leonid Libkin (University of Toronto) in a 2004 paper titled “A Normal Form for XML Documents” Recognized a need for good XML data design as “a lot of data is being put on the web” “Once massive web databases are created, it is very hard to change their organization; thus, there is a risk of having large amounts of widely accessible, but at the same time poorly organized legacy data.” March 14, 2011 Ryan Marcotte 6 Introduction XNF provides a set of rules that describe well-formed DTDs Poorly-designed DTDs can be transformed into well- formed ones (through normalization – just like relational databases!) Well-formed DTDs avoid redundancies and update anomalies March 14, 2011 Ryan Marcotte 7 March 14, 2011 Ryan Marcotte 8 Review of Basic Terms Recall the definition of functional dependencies (FDs) Given a relation schema R, a set of attributes X is said to functionally determine another set of attributes Y (also in R), written X → Y, if and only if for each unique value for X there is exactly one value for Y March 14, 2011 Ryan Marcotte 9 Review of Basic Terms F+ is the closure of FDs derived using Armstrong’s axioms: reflexivity (if Y ⊆ X, then X → Y) augmentation (if X → Y, then XZ → YZ) transitivity (if X → Y and Y → Z, then X → Z) Every set of FDs has a canonical cover (a minimal set of FDs such that all other FDs can be derived using the above axioms) March 14, 2011 Ryan Marcotte 10 Review of Basic Terms An element represents a node in the XML tree and includes everything from its start tag to its end tag An attribute provides additional information about an element; attributes begin with @ A path in an XML document is a sequence of element names separated by periods, ending with an element name or an attribute name March 14, 2011 Ryan Marcotte 11 Review of Basic Terms <!DOCTYPE students [ <!ELEMENT student_list (student)*> <!ELEMENT student (first_name, last_name)> <!ELEMENT first_name (#PCDATA)> <!ELEMENT last_name (#PCDATA)> <!ATTLIST student id CDATA #REQUIRED> ]> For example: student_list.student.first_name.S student_list.student.@id March 14, 2011 Ryan Marcotte 12 Review of Basic Terms The term S represents a string value (corresponding to the #PCDATA keyword in the DTD) For example, if the element name is first_name and the element is <first_name>Paul</first_name>, then S = Paul March 14, 2011 Ryan Marcotte 13 Review of Basic Terms Redundancy occurs when data corresponding to a single element is stored more than once Update anomalies take two forms: Because data for an element is stored multiple times, updating one record creates an inconsistency Removing an element may remove it from the document entirely Examples of the above will be given later in the presentation March 14, 2011 Ryan Marcotte 14 Boyce-Codd Normal Form A relational database is in BCNF if and only if for every one of its nontrivial FDs X → Y, X is a superkey (X is either a candidate key or a superset thereof) Simply speaking, for distinct X, there is exactly one value for Y (no redundancy) Note that the number of attributes in the key X should be minimized for ease of identification among individual tuples March 14, 2011 Ryan Marcotte 15 Boyce-Codd Normal Form Examples: sid, first_name, last_name → age (BAD – not minimum size) sid → first_name, last_name, age (GOOD – only one attribute) cid → course_name, semester_offered course_name → course_description March 14, 2011 Ryan Marcotte 16 XNF Versus BCNF XNF generalizes Boyce-Codd Normal Form XNF disallows redundancy-causing FDs March 14, 2011 Ryan Marcotte 17 XML Normal Form Let P1 and P2 be paths in an XML document A DTD D and its set of FDs F is in XNF if and only if for every one of its nontrivial FDs of the form P1 → P2.@a (where @a is an attribute) or P1 → P2.E (where E is an + element), it is the case that P1 → P2 is implied by F March 14, 2011 Ryan Marcotte 18 XML Normal Form In layman’s terms, for distinct values of P1, there is only one value for P2 This is remarkably similar to our definition of BCNF! In fact, a relational database schema is in BCNF if and only if it’s XML schema equivalent is in XNF (this will not be proven here) March 14, 2011 Ryan Marcotte 19 XML Normal Form <!DOCTYPE students [ <!ELEMENT student_list (STUDENT)*> <!ELEMENT student (first_name, last_name)> <!ELEMENT first_name (#PCDATA)> <!ELEMENT last_name (#PCDATA)> <!ATTLIST student id CDATA #REQUIRED> ]> student_list.student.@id → student_list.student.first_name.S, student_list.student.last_name.S March 14, 2011 Ryan Marcotte 20 Relational Schema to XML Let R be a relation over attributes A, B, C The schema R(A, B, C) with FD A → BC translates to: <!ELEMENT db (G*)> <!ELEMENT G EMPTY> <!ATTLIST G A CDATA #REQUIRED B CDATA #REQUIRED C CDATA #REQUIRED> ... with FD db.G.@A → db.G.@B, db.G.@C March 14, 2011 Ryan Marcotte 21 March 14, 2011 Ryan Marcotte 22 Usage The following algorithm must be used in the design stage of XML database creation Once data exists in the XML database, it can be very tedious and/or difficult to modify the schema (also, errors may be introduced as a result of the database modifications if it is done by hand) March 14, 2011 Ryan Marcotte 23 Assumptions DTDs are assumed to be nonrecursive (nonrecursive DTDs lead to an infinite number of paths) Note that we can allow for recursion by considering that FDs only specify a finite number of paths and so we can restrict our attention to a finite number of ‘unfoldings’ of the recursive rules FDs are assumed to have at least one element path on the left-hand side of the rule (that is, FDs are of the form { p, p1.@a1, p1.@a2, ..., p1.@an } → q) March 14, 2011 Ryan Marcotte 24 Basic Operations Move attributes / child elements from an existing element to another one Create a new element type March 14, 2011 Ryan Marcotte 25 Algorithm Given a DTD D and set of FDs F: If (D, F) is in XNF, return Otherwise, find an anomalous FD and use the two basic operations to modify D to eliminate the anomalous FD Repeat the above – the first step will cause the algorithm to terminate once (D, F) is in XNF March 14, 2011 Ryan Marcotte 26 Algorithm Just like other normalization algorithms (for 1NF, 2NF, 3NF, and BCNF), the algorithm: Is simple Decomposes the schema into separate data structures (tables for relational databases, trees for XML) FDs are preserved (it is lossless) The algorithm always terminates; this will not be proven here March 14, 2011 Ryan Marcotte 27 March 14, 2011 Ryan Marcotte 28 Example Schema <!DOCTYPE courses [ <!ELEMENT courses (course*)> <!ELEMENT course (title, taken_by)> <!ATTLIST course cno CDATA #REQUIRED> <!ELEMENT title (#PCDATA)> <!ELEMENT taken_by (student*)> <!ELEMENT student (name, grade)> <!ATTLIST student sno CDATA #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT grade (#PCDATA)> ]> FDs: courses.course.@cno → courses.course { courses.course, courses.course.taken_by.student.@sid } → courses.course.taken_by.student courses.course.taken_by.student.@sid → courses.course.taken_by.student.name.S March 14, 2011 Ryan Marcotte 29 Example Schema The previous FDs enforce the following constraints: A course ID uniquely identifies a course Two distinct students of the same course cannot have the same student ID Two students with the same student ID must have the same name March 14, 2011 Ryan Marcotte 30 Schema Problems Or do they? Consider the third FD: courses.course.taken_by.student.@sid → courses.course.taken_by.student.name.S By XNF, the following must hold: courses.course.taken_by.student.@sid → courses.course.taken_by.student.name It does not. Why? March 14, 2011 Ryan Marcotte 31 Schema Problems \ A single @sid identifies two distinct paths! March 14, 2011 Ryan Marcotte 32 Schema Problems The third FD can be violated under the current schema This is because multiple copies of the name element are stored for each unique @sid; because of this, changing a value in one place introduces inconsistency Also, deleting student information from a course could remove that student from the database if only one copy of that student’s information exists The above two points are examples of update anomalies March 14, 2011 Ryan Marcotte 33 Using the Algorithm Fix by creating a new element type student_info with @sid as its key Move the name element from the student element to the student_info element Though it is not part of the algorithm, we will modify the root element name from “courses” to “db” (database) to better reflect intended semantics March 14, 2011 Ryan Marcotte 34 Using the Algorithm <!DOCTYPE university_db [ <!ELEMENT db (course*, student_info*)> <!ELEMENT course (title, taken_by)> <!ATTLIST course
Recommended publications
  • Health Sensor Data Management in Cloud
    Special Issue - 2015 International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 NCRTS-2015 Conference Proceedings Health Sensor Data Management in Cloud Rashmi Sahu Department of Computer Science and Engineering BMSIT,Avallahalli,Yelahanka,Bangalore Visveswariya Technological University Abstract--Wearable sensor devices with cloud computing uses its software holds 54% of patient records in US and feature have great impact in our daily lives. This 2.5% of patient records in world wide.[9] technology provides services to acquire, consume and share personal health information. Apart from that we can be How resource wastage can be refused by using cloud connected with smart phones through which we can access technology information through sensor devices equipped with our smart phone. Now smartphones has been resulted in the new ways. It is getting embedded with sensor devices such as Suppose there are 3 Hospitals A,B,C.Each hospital cameras, microphones, accelerometers, proximity sensors, maintains their own network database server,they have GPS etc. through which we can track information and management department and softwares,maintainance significant parameter about physiology. Some of the department and softwares.They organizes their own data wearable tech devices are popular today like Jawbone Up and they maintained by their own.But there is resource and Fitbit Flex, HeartMath Inner Balance Sensor, wastage,means three different health organizations Tinke.This paper is survey in area of medical field that utilizing resources having paid and costs three times of represents why cloud technologies used in medical field and single plus waste of data space also.so why can’t we how health data managed in cloud.
    [Show full text]
  • Requirements for XML Document Database Systems Airi Salminen Frank Wm
    Requirements for XML Document Database Systems Airi Salminen Frank Wm. Tompa Dept. of Computer Science and Information Systems Department of Computer Science University of Jyväskylä University of Waterloo Jyväskylä, Finland Waterloo, ON, Canada +358-14-2603031 +1-519-888-4567 ext. 4675 [email protected] [email protected] ABSTRACT On the other hand, XML will also be used in ways SGML and The shift from SGML to XML has created new demands for HTML were not, most notably as the data exchange format managing structured documents. Many XML documents will be between different applications. As was the situation with transient representations for the purpose of data exchange dynamically created HTML documents, in the new areas there is between different types of applications, but there will also be a not necessarily a need for persistent storage of XML documents. need for effective means to manage persistent XML data as a Often, however, document storage and the capability to present database. In this paper we explore requirements for an XML documents to a human reader as they are or were transmitted is database management system. The purpose of the paper is not to important to preserve the communications among different parties suggest a single type of system covering all necessary features. in the form understood and agreed to by them. Instead the purpose is to initiate discussion of the requirements Effective means for the management of persistent XML data as a arising from document collections, to offer a context in which to database are needed. We define an XML document database (or evaluate current and future solutions, and to encourage the more generally an XML database, since every XML database development of proper models and systems for XML database must manage documents) to be a collection of XML documents management.
    [Show full text]
  • Model-Based XML to Relational Database Mapping Choices
    International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-8 Issue-3S, October 2019 Model-based XML to Relational Database Mapping Choices Emyliana Song, Su-Cheng Haw, Fang-Fang Chua Type Definition file (DTD) or XML schema to define Abstract— Extensible Markup Language (XML) technology structure of XML document. For model-based mapping, is widely used for data exchange and data representation in both DTD and XML schema is not needed. online and offline mode. This structured format language able to be transformed into other formats and share information The rest of the paper is organized as follows. Existing and across platforms. XML is simple; however, it is designed to related approaches on model-based mapping schemes are accommodate changes. For this paper, a study on reviewed in section 2. Section 3 discussed the performance transformation of XML document into relational database is evaluation carried out in the experiment of selected conducted. Crucial part of this process is how to maintain the approaches. Experimental results and analysis of the hierarchy and relationships between data in the document into findings are presented in section 4. And lastly, Section 5 database. Approaches that are discussed in this paper each uses own unique way of data storing technique and database design. conclude the paper. Therefore, each algorithm is assessed with three datasets constitute of small, medium and large size XML file. The II. LITERATURE REVIEW efficiency of the algorithms is being tested on time taken for data storing and query execution process. At the end of the Throughout the years, numerous mapping schemes have evaluation, we discuss factors that affect algorithm performance been proposed to resolve issues on transforming XML to and present suggestions to improve mapping scheme for future relational database structure.
    [Show full text]
  • XML in Oracle 9I
    Geoff Lee Senior Product Manager Oracle Corporation XML in Oracle9i A Technical Overview Agenda ! Survey ! Technical Overview ! Summary ! Q & A Agenda ! Survey ! Technical Overview ! Summary ! Q & A XML in Oracle9i - Overview ! XML and its family of standards are vital to the future of e-Business ! Oracle9i XML Developer’s Kits support the family of XML standards to provide a complete XML-enabled Internet application platform ! Oracle9i Database Native XML support enables fast, flexible, and scalable storage and retrieval of XML data and documents ! XML Messaging and Transformation support in AQ provide a centralized, easy to manage, secure infrastructure for global messaging Agenda ! Survey ! Technical Overview – XML – XDK – Database Native XML – XML Messaging ! Summary ! Q & A Application Requirements 1 Internet Content 2 Internet Application Management Development ConsolidateConsolidate InternetInternet contentcontent MakeMake webweb sitesite transactional,transactional, BuildBuild dynamicdynamic webweb sites/portalssites/portals secure,secure, scalablescalable andand availableavailable 5 3 Enterprise Application Business Intelligence Integration Capture,Capture, analyze,analyze, andand shareshare IntegrateIntegrate webweb sites,sites, ERP,ERP, businessbusiness intelligenceintelligence legacylegacy systems,systems, supplierssuppliers 4 Mobile Information Access MakeMake webweb sitessites accessibleaccessible fromfrom anyany mobilemobile devicedevice Internet N-Tier Architecture Web Browser HTTP Listener Dispatcher Web Server HTML Application
    [Show full text]
  • Transactional Support in Native Xml Databases
    TRANSACTIONAL SUPPORT IN NATIVE XML DATABASES Theo Härder, Sebastian Bächle and Christian Mathis University of Kaiserslautern, Gottlieb-Daimler-Str., 67663 Kaiserslautern, Germany Keywords: XML database management, concurrency control, logging and recovery, elementless XML storage. Abstract: Apparently, everything that can be said about concurrency control and recovery is already said. None the less, the XML model poses new problems for the optimization of transaction processing. In this position paper, we report on our view concerning XML transaction optimization. We explore aspects of fine-grained transaction isolation using tailor-made lock protocols. Furthermore, we outline XML storage techniques where storage representation and logging can be minimized in specific application scenarios. 1 INTRODUCTION set of standards for information exchange and representation. It seems, the more domains are When talking about transaction management, every- conquered by XML (by defining schemas for body implicitly refers to relational technology. It is business cooperation), the more the relational true that the basic concepts of ACID transactions systems approach “legacy”. (Härder and Reuter, 1983) were primarily laid in the Hence, efficient and effective transaction-pro- context of flat table processing and the related query tected collaboration on XML documents (XQuery languages and later adjusted to object orientation. As Update Facility) becomes a pressing issue. a major advance for transaction processing, Weikum Solutions, optimal in the relational world, may fail and Vossen (2002) unified concurrency control and to be appropriate because of the documents’ tree recovery for both the page and object model. Perfor- characteristics and differing processing models. mance concerns led to a refinement of the page Structure variations and workload changes imply model to exploit records as more fine-grained units that transaction-related protocols must exhibit better of concurrency control.
    [Show full text]
  • XML Native Databases and Legislative Documents: a White Paper
    Preserving State Government Digital Information Minnesota Historical Society XML Native Databases and Legislative Documents: A White Paper Abstract XML native databases can improve access to and use of text-based XML encoded information by providing full-text indexing and native storage, and they can be even more powerful when used with other XML standards and tools such as XQuery and XForms. Any comments, corrections, or recommendations may be sent to the project team, care of: Nancy Hoffman Project Analyst Minnesota Historical Society [email protected] / (651) 259-3367 The Problem Electronic legislative records present a classic information management problem – how can text- based information be as easy to access and use as data stored in traditional database systems?1 A common solution has been to take data out of a document and place it into defined fields in a traditional database management system - a process called “shredding.” Removing and storing data elements in this way is not only difficult and time consuming, but more importantly, the database cannot take full advantage of the value of the text because the actual documents are stored in a separate file system or are completely obscure when stored as objects (called binary large object types or BLOBs ) inside the database.2 XML Changes Everything The adoption of XML bill drafting systems for legislation and increasing use of XML to structure related information opens new possibilities afforded by XML-related tools and 1 http://www.cs.ubc.ca/grads/resources/thesis/May04/Fengdong_Du.pdf [accessed 12/18/2009] 2 http://www.ibm.com/developerworks/xml/library/x-xml2008prevw.html [accessed 12/22/2009] Minnesota Historical Society / State Archives Page 1 of 12 NDIIPP XML Native Databases White Paper Version 1, December 2009 http://www.mnhs.org/ndiipp standards that have developed over the ten-plus years since its introduction.
    [Show full text]
  • Examining Database Persistence of ISO/EN 13606 Standardized Electronic Health Record Extracts: Relational Vs
    Sánchez-de-Madariaga et al. BMC Medical Informatics and Decision Making (2017) 17:123 DOI 10.1186/s12911-017-0515-4 RESEARCH ARTICLE Open Access Examining database persistence of ISO/EN 13606 standardized electronic health record extracts: relational vs. NoSQL approaches Ricardo Sánchez-de-Madariaga1* , Adolfo Muñoz1, Raimundo Lozano-Rubí2,3, Pablo Serrano-Balazote4, Antonio L. Castro1, Oscar Moreno1 and Mario Pascual1 Abstract Background: The objective of this research is to compare the relational and non-relational (NoSQL) database systems approaches in order to store, recover, query and persist standardized medical information in the form of ISO/EN 13606 normalized Electronic Health Record XML extracts, both in isolation and concurrently. NoSQL database systems have recently attracted much attention, but few studies in the literature address their direct comparison with relational databases when applied to build the persistence layer of a standardized medical information system. Methods: One relational and two NoSQL databases (one document-based and one native XML database) of three different sizes have been created in order to evaluate and compare the response times (algorithmic complexity) of six different complexity growing queries, which have been performed on them. Similar appropriate results available in the literature have also been considered. Results: Relational and non-relational NoSQL database systems show almost linear algorithmic complexity query execution. However, they show very different linear slopes, the former being much steeper than the two latter. Document-based NoSQL databases perform better in concurrency than in isolation, and also better than relational databases in concurrency. Conclusion: Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications).
    [Show full text]
  • Translating Between XML and Relational Databases Using XML Schema and Automed
    Imperial College of Science, Technology and Medicine (University of London) Department of Computing Translating between XML and Relational Databases using XML Schema and Automed Andrew Charles Smith acs203 Submitted in partial fulfillment of the requirements for the MSc Degree in Advanced Computing of the University of London and for the Diploma of Imperial College of Science, Technology and Medicine. September 2004 Acknowledgements I would like to thank my supervisor, Peter McBrien, for his support during this project and his help with the Automed API. I would also like to thank Lucas Zamboulis for his help with the XML queries and Nicolas Debarnot for his help with LATEX. 1 Contents 1 Introduction 8 2 XML and RDBMS 11 2.1 Motivating Example . 11 2.2 RDBMS . 11 2.3 XML . 12 2.3.1 Constraining XML . 12 2.3.2 XML Schema . 12 2.4 Representing Data Graphically . 17 2.4.1 XPath . 18 2.4.2 XMLSPY . 19 2.4.3 Entity-Relationship Models . 19 2.5 Querying . 19 2.5.1 XQuery . 20 2.5.2 SQL . 20 3 Moving Data between RDBMSs and XML 21 3.1 XML/Relational Schemas . 21 3.1.1 Differences between XML and relational schemas . 21 3.2 The Translation Process . 21 3.3 Main problems . 22 3.4 Existing Approaches . 22 3.4.1 Storing XML in relational databases . 23 3.4.2 Exporting relational data to XML . 24 3.4.3 Generating an XML schema from a Relational schema . 26 3.5 Choosing the most appropriate schema . 26 3.5.1 LegoDB .
    [Show full text]
  • XML Programming with SQL/XML and Xquery
    XML programming with SQL/XML and XQuery by J. E. Funderburk S. Malaika B. Reinwald Most business data are stored in relational a company is as profound as it is between compa- database systems, and SQL (Structured nies. Much of the data being exchanged are oper- Query Language) is used for data retrieval ational: data that enable transactions, determine the and manipulation. With XML (Extensible course of business processes, and in the aggregate, Markup Language) rapidly becoming the de become business intelligence data that affect deci- facto standard for retrieving and exchanging sions of business leaders. data, new functionality is expected from traditional databases. Existing SQL The XML data format provides a way of regularizing applications will evolve to retrieve relational the storage of semi-structured data, historical data, data as XML data using database or SQL and other information requiring content manage- extensions for XML. New XML data will be ment. XML can be used to store the content itself stored, searched, and manipulated in the and data mined from the content. Data mined from database as a “first class” citizen along with the content can be used to form catalogs, similar in concept to card catalogs in libraries, which contain existing relational data. Furthermore, new existence and location information and possibly other applications will emerge that solely operate in interesting summary information. Information mined terms of XML. These new XML applications from content can be stored and used in business or operate on the same database using an XML scientific intelligence queries. query language, XQuery. In this paper, we describe an integrated database architecture Information that is low in quantity and importance that enables SQL applications with XML can be stored using a variety of simple techniques.
    [Show full text]
  • Database Management Systems Ebooks for All Edition (
    Database Management Systems eBooks For All Edition (www.ebooks-for-all.com) PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sun, 20 Oct 2013 01:48:50 UTC Contents Articles Database 1 Database model 16 Database normalization 23 Database storage structures 31 Distributed database 33 Federated database system 36 Referential integrity 40 Relational algebra 41 Relational calculus 53 Relational database 53 Relational database management system 57 Relational model 59 Object-relational database 69 Transaction processing 72 Concepts 76 ACID 76 Create, read, update and delete 79 Null (SQL) 80 Candidate key 96 Foreign key 98 Unique key 102 Superkey 105 Surrogate key 107 Armstrong's axioms 111 Objects 113 Relation (database) 113 Table (database) 115 Column (database) 116 Row (database) 117 View (SQL) 118 Database transaction 120 Transaction log 123 Database trigger 124 Database index 130 Stored procedure 135 Cursor (databases) 138 Partition (database) 143 Components 145 Concurrency control 145 Data dictionary 152 Java Database Connectivity 154 XQuery API for Java 157 ODBC 163 Query language 169 Query optimization 170 Query plan 173 Functions 175 Database administration and automation 175 Replication (computing) 177 Database Products 183 Comparison of object database management systems 183 Comparison of object-relational database management systems 185 List of relational database management systems 187 Comparison of relational database management systems 190 Document-oriented database 213 Graph database 217 NoSQL 226 NewSQL 232 References Article Sources and Contributors 234 Image Sources, Licenses and Contributors 240 Article Licenses License 241 Database 1 Database A database is an organized collection of data.
    [Show full text]
  • Synopsis Data Structures for XML Databases: Models, Issues, and Research Perspectives
    18th International Workshop on Database and Expert Systems Applications Synopsis Data Structures for XML Databases: Models, Issues, and Research Perspectives Angela Bonifati Alfredo Cuzzocrea Icar CNR, Italian National Research Council Icar CNR, Italian National Research Council & Via P. Bucci 41C, I-87036 Rende, Italy DEIS, University of Calabria [email protected] Via P. Bucci 41C, I-87036 Rende, Italy [email protected] Abstract in nature and not prone to be represented as a set of rela- tions or objects; (ii) quite often, XML documents appear Due to the lack of efficient native XML database man- in a schema-less fashion (e.g., like those of corporate B2B agement systems, XML data manipulation and query eval- and B2C e-commerce Web systems), thus making the rela- uation may be resource-consuming, and represent a bottle- tional translation more difficult; (iii) the inherent “richness” neck for several computationally intensive applications. To of the standard XML query language, which defines a com- overcome the above limitations, a possible solution consists prehensive class of queries with possibly complex syntax in computing synopsis data structures from XML databases, and predicates (e.g., for clause of XQuery queries [6], twig i.e. compressed representations providing a “succinct” de- XML queries [7], partial- and exact-match XPath queries scription of the original databases while ensuring low com- [8] etc.); (iv) the ambiguity of the XML semantics during putational overhead and high accuracy for many XML pro- query evaluation [20]; (v) problematic update management cessing tasks. Specifically, these data structures are very issues posed by processing XML data [2, 3, 31].
    [Show full text]
  • XML Databases, Nosql Databases
    PB138 — XML databases, NoSQL databases (C) 2019 Masaryk University -- Tomáš Pitner, Luděk Bártek, Adam Rambousek NoSQL databases • non relational databases, flexible schema • often used for big data applications, clusters • different storage structure than SQL databases • give up constraints/transactions to improve performance • low-level interface NoSQL types • key-value ◦ Redis, Memcached, Amazon SimpleDB… • document (JSON, XML…) ◦ CouchDB, Elasticsearch, MongoDB… • graph / RDF triple ◦ Virtuoso, Neo4j… • object ◦ Caché, GemStone… RDF databases / triple store • standard data model (RDF) • standardized interchange format (N-Triples, N-Quads, XML,…) • query language (SPARQL), Linked Data • native ◦ Apache Jena, Sesame/RDF4J… • RDF layer to relational database ◦ Virtuoso, IBM DB2… SPARQL • SPARQL Protocol and RDF Query Language • W3C Recommendation SPARQL 1.1, March 2013 • SELECT - values as table • CONSTRUCT - extract RDF • ASK - true/false • DESCRIBE - extract RDF graph 1 • inferencing SPARQL example Ontology ex1:FullProfessor rdf:subClassOf ex1:Professor. ex1:AssistantProfessor rdf:subClassOf ex1:Professor. ex1:Professor owl:equivalentClass ex2:Teacher Data ex1:Bob rdf:type ex1:FullProfessor . ex1:Alice rdf:type ex1:AssistantProfessor . ex2:Mary rdf:type ex2:Teacher SPARQL example Data ex1:Bob rdf:type ex1:FullProfessor . ex1:Alice rdf:type ex1:AssistantProfessor . ex2:Mary rdf:type ex2:Teacher SPARQL query SELECT ?x WHERE { ?x rdf:type ex1:Professor } • noone is Professor, but inferencing will find Bob, Alice, Mary XML databases, when
    [Show full text]