XML Database

XML Database Introduction: Relational Database: A relational database is a powerful data storage and retrieval technology where data is stored as rows in tables and the database has one or more tables. Each row of a table has the same columns as every other row in that table. Data is related between tables using the concept of “foreign keys” so that data in a row of one table can be associated with one or more rows of another table. Data in a relational database is readable by executing SQL queries in a management tool to extract and present the data in any number of ways. The extraction requires an understanding of the database structure, including the foreign key relationships. Designing a good non-trivial relational database requires significant training and/or significant experience with relational database design techniques. XML Database: XML has emerged as the standard for representing and exchanging data on the World Wide Web. The increasing amount of XML documents requires the need to store and query XML documents efficiently. XML Database is used to store the huge amount of information in the XML format. XML Schema is commonly known as XML Schema Definition (XSD). It is used to describe and validate the structure and the content of XML data. XML schema defines the elements, attributes and data types. XML is becoming the predominant data format in a variety of application domains (e.g., supply- chain, scientific data processing, telecommunication infrastructure). Many such applications produce and consume large volumes of XML data and thus require efficient and reliable storage systems. The use of relational database systems for this purpose has attracted considerable interest both by the research community and the database vendors. Elements are the fundamental units of XML content. Element name: wrapped in tags (markups), which describes the content (metadata). Element content: anything go between a pair of opening and closing tag. Major differences between XML data and relational data XML data is hierarchical; relational data is represented in a model of logical relationships An XML document contains information about the relationship of data items to each other in the form of the hierarchy. With the relational model, the only types of relationships that can be defined are parent table and dependent table relationships. XML data is self-describing; relational data is not An XML document contains not only the data, but also tagging for the data that explains what it is. A single document can have different types of data. With the relational model, the content of the data is defined by its column definition. All data in a column must have the same type of data. XML data has inherent ordering; relational data does not For an XML document, the order in which data items are specified is assumed to be the order of the data in the document. There is often no other way to specify order within the document. For relational data, the order of the rows is not guaranteed unless you specify an ORDER BY clause on one or more columns. Comparison of Concepts between XML Database and Relational Database Systems: 1. Structuring and Typing Mechanisms The basic mechanisms used to specify the structure of XML documents and relational schemata are element types and attributes for XML as well as relations and attributes for RDBS. For each XML document, it is required that all component element types are rooted in a single element type. This is in contrast to RDBS, where part-of hierarchies cannot be realized by means of nesting since relations consist of atomic-valued attributes, only. However, part-of hierarchies can be expressed in RDBS by means of foreign key constraints 2. Uniqueness of Names The name of a relation is required to be unique within the whole relational schema, similar to the name of an XML element type being unique throughout the DTD. By means of so called namespaces, XML allows element types having the same name by using different namespace prefixes. The name of an XML attribute defined within a DTD or an XML Schema has to be unique within its element type, again similar to an RDBS attribute’s name which has to be unique within its relation. 3. Null Values and Default Values Similar to RDBS, XML allows to express null values as well as default values. In RDBS the concept of null values is defined for attributes, only. XML, however, supports null values for both attributes and elements. In DTDs, default values may be applied to XML attributes, only, whereas XML Schema supports default values for XML element types, too. 4. Identification In RDBS, the unique identification of tuples is done by means of a primary key, which may be composed of one or more attributes of the corresponding relation. In DTDs, only a single attribute of an element type can be designated as identifying attribute by means of the special attribute type ID which may in turn contain a string value. XML Schema allows not just attributes, but also element types of any other elements of the same element type but rather across all elements of any element type. XML Schema allows to specify the scope for each key by means of an XPath expression. 5. Relationships In RDBS, relationships can be expressed between relations by means of foreign keys, i.e., arbitrary attributes that refer to the primary key of the same arbitrary atomic domain and combinations thereof to serve as keys. The scope of identification in RDBS is a single relation, i.e., the value of the primary key uniquely identifies each tuple within a relation. In DTDs, the scope of identification is broader in the sense that the value of an ID attribute is unique within the whole XML document. This allows the unique identification of an element not only with respect to relation or of another relation. The number of tuples which may participate in a relationship can be constrained by defining the foreign key as NOT NULL and/or UNIQUE. Fig: Comparison of Relationships Example showing XML and Relational Database EmployeeDB Name Company Phone Ruth Sam XML Database 443-123-4567 Tommy Bryan XML Database 443-789-4567 <?xml version="1.0"?> <EmployeeDB> <Employee1> <name>Ruth Sam</name> <company>XML Database</company> <phone>443-123-4567</phone> </Employee1> <Employee2> <name>Tommy Bryan</name> <company>XML Database</company> <phone>443-789-4567</phone> </Employee2> </EmployeeDB> XML Database Types There are three different types of XML databases: 1. Native XML Database (NXD): (a) Defines a (logical) model for an XML document — as opposed to the data in that document – and stores and retrieves documents according to that model. At a minimum, the model must include elements, attributes, PCDATA, and document order. Examples of such models are the XPath data model, the XML Infoset, and the models implied by the DOM and the events in SAX 1.0. (b) Has an XML document as its fundamental unit of (logical) storage, just as a relational database has a row in a table as its fundamental unit of (logical) storage. (c) Is not required to have any particular underlying physical storage model. For example, it can be built on a relational, hierarchical, or object-oriented database, or use a proprietary storage format such as indexed, compressed files. 2. XML Enabled Database (XEDB): A database that has an added XML mapping layer provided either by the database vendor or a third party. This mapping layer manages the storage and retrieval of XML data. Data that is mapped into the database is mapped into application specific formats and the original XML meta-data and structure may be lost. Data retrieved as XML is NOT guaranteed to have originated in XML form. Data manipulation may occur via either XML specific technologies (e.g. XPath, XSLT, DOM or SAX) or other database technologies (e.g. SQL). The fundamental unit of storage in an XML Enabled Database is implementation dependent. 3. Hybrid XML Databases (HXD): A database that can be treated as either a Native XML Database or as an XML Enabled Database depending on the requirements of the application. XML Documents can be Data-Centric and Document-Centric XML Data-centric are documents produced as an import or export format, that is, data-centric XML documents are used for machine consumption. These documents are used for communicating data between companies or applications and the fact that XML is used as a common format is simply a matter of convenience, for reasons of interoperability. Examples of data-centric documents are sales orders, scientific data, and stock quotes. Document-centric are documents usually designed for human consumption, with examples ranging from books to hand-written XHTML documents. They are usually composed directly in XML, or some other format and then converted to XML. Document-centric documents do not need to have regular structure, have coarse-grained data (that is the smallest independent data unit may as well be a document itself) and have mixed content. For example, the following memo document is document-centric. Converting XML to relational database There are various ways to convert effectively and automatically XML data into and out of relational databases. DB2 offers two methods for shredding XML data. The first method uses SQL INSERT statements with the XMLTABLE function. One such INSERT statement is required for each target table and multiple statements can be combined in a stored procedure to avoid repetitive parsing of the same XML document. The shredding statements can include XQuery and SQL functions, joins to other tables, or references to DB2 sequences. These features allow for customization and a high degree of flexibility in the shredding process, but require manual coding.

Load more