1.XML
1.1 XML Overview XML stands for Extensible Markup Language and is a text-based markup language derived from Standard Generalized Markup Language (SGML). XML tags identify the data and used to store and organize data, rather than specifying how to display it like HTML tags are used to display the data. XML is not going to replace HTML in the near but introduces new possibilities by adopting many successful features of HTML. There are three important characteristics of XML that make it useful in a variety of systems and solutions: XML is extensible : XML essentially allows you to create your own language, or tags, that suits your application. XML separates data from presentation : XML allows you to store content with regard to how it will be presented. XML is a public standard : XML was developed by an organization called the World Wide Web Consortium (W3C) and available as an open standard.
1.1.1 XML Usage A short list of XML's usage says it all: XML can work behind the scene to simplify the creation of HTML documents for large web sites. XML can be used to exchange of information between organizations and systems. XML can be used for offloading and reloading of databases. XML can be used to store and arrange data in a way that is customizable for your needs. XML can easily be mixed with stylesheets to create almost any output desired. Virtually any type of data can be expressed as an XML document. 1.1.2 What is Markup? Markup is information added to a document that enhances its meaning in certain ways, in that it identifies the parts and how they relate to each other. Following is an example of how XML markup looks when embedded in a piece of text:
1.1.3 Is XML a Programming Language? A programming language is a vocabulary and syntax for instructing a computer to perform specific tasks. XML doesn't qualify as a programming language because it doesn't instruct a computer to do anything, as such. It's usually stored in a simple text file and is processed by special software that's capable of interpreting XML. Following is the simple syntax rules to be followed to write a XML document.
Wrong nested tags Correct nested tags
(3) Root element : A XML document can have only one root element. For example following is not a well-formed XML document, because both the x and y elements occur at the top level:
1.3.1 Syntax Rules for XML Attributes (1) - Attribute names in XML (unlike HTML) are case sensitive i.e HREF and href refer to two different XML attributes. (2) - Same attribute cannot have two values. The following example is not well-formed because the b attribute is specified twice: .... (3) - An attribute name should not be defined in quotation marks, whereas attribute values must always appear in quotation marks. Following example demonstrates WRONG xml format: ....
1.4 XML References References usually allow you to add or include additional text or markup in an XML document. References always begin with the character "&" (which is specially reserved) and end with the character ";".
XML has two types of references:
(1) Entity References : An entity reference contains a name between the start and end delimiters. For example & where amp is name. The name refers to a predefined string of text and/or markup. (2) Character References : These contain references, like A, contains a hash mark (“#”) followed by a number. The number always refers to the Unicode code for a single character. In this case 65 refers to letter "A".
1.5 Documents An XML document is a basic unit of XML information, composed of elements and other markup in an orderly package. An XML document can contain any data, say for example, database of numbers, or some abstract structure representing a molecule or equation. Example: A simple document would look like as in the following example: document prolog
1.6 Comments XML comments are similar to HTML comments. Comments are added as notes or lines for understanding purpose of a XML code.Comments can be used to include related links, information and terms. Comments can be viewed only in the source code not in the XML code. Comments may appear anywhere in the XML code.
Syntax: Comments starts with . You can add textual notes as a comment between the characters. Never nest one comment with other comment. Example:
1.7 Character Entities Character Entities are used to name some of the entities which are symbolic representation of information i.e characters that are difficult or impossible to type can be substituted by Character Entities.There are three types of character entities: Predefined Character Entities Numbered Character Entities Named Character Entities
1.7.1 Predefined Character Entity These are introduced to avoid the ambiguity while using some of symbols. For example ambiguity observed when less than ( < ) or greater than ( > ) symbol is used with the angle tag(<>). These are basically used to delimit tags in XML. Following is a list of pre-defined character entities from XML specification. These can be used to express characters safely. ampersand: & Single quote: ' Greater than: > Less than: < Double quote: "
1.7.2 Numbered Character Reference Numeric reference can either be in decimal or hexadecimal numbers. Numeric reference refers to the character by its number in the Unicode character set. General syntax for decimal numeric reference is: decimal number ; General syntax for hexadecimal numeric reference is: Hexadecimal number ; Character Entities with their Numeric values Entity Decimal Hexadecimal Character name reference reference
Quot " " "
Amp & & &
Apos ' ' '
Lt < < <
Gt > > > 1.7.3 Named Character Entity As its hard to remember the numeric characters, the most preferred type of character entity is the named character entity. Here each entity is been identified with a name.
Example: 'Aacute' represents capital character with acute accent. 'ugrave' represents the small with grave accent. 1.8 CDATA Sections CDATA ( Character Data) are used to escape block of text that does not parsed by the parser and are otherwise recognized as markup. The predefined entities such as <, >, and & require typing and are generally hard to read in the markup. For such cases CDATA section can be used.
Syntax: Details of the above syntax:
CDATA Start section - CDATA begins with the nine-character delimiter delimiter. CData section - Characters between these two enclosures are interpreted as characters, and not as markup. This section may contain markup characters (<, >, and &), but they are ignored by the XML processor. CDATA cannot contain the string "]]>" anywhere in the XML document.Nesting is not allowed in CDATA section.
Example: Here is the example of CDATA. Here everything written inside the CDATA section will be ignored by the parser.
Here
1.9 Whitespaces Whitespaces is a collection of spaces, tabs, and newlines. These are usually used to make a document more readable. XML document contain two types of white spaces (a) Significant Whitespace and (b) insignificant Whitespace.
1.9.1 Significant Whitespace A significant whitespace occurs within the element which contain text and markup mixed together. For example:
These two elements are different because of that space between Akilesh and Prathick, and any program reading this element in an XML file is obliged to maintain the distinction.
1.9.2 Insignificant Whitespace Insignificant whitespace means the spaces where only element content is allowed. For example:
The above two examples are same.(Here space is represented by dots (.)). Here the space between address and category is insignificant. A special attribute named xml:space may be attached to an element. This indicates that white space should not be removed by applications for that element.
You can set default or preserve to this attribute as shown in the example below: Where: The value default signals that applications' default white-space processing modes are acceptable for this element; The value preserve indicates the intent that applications preserve all the white space.
1.10 Encoding Encoding is the process of converting characters into their equivalent binary representation. When an XML processor reads an XML document, depending on the type of encoding it encodes the document. Hence we need to specify the type of encoding in the XML declaration.There are mainly two types of encoding present: UTF-8 UTF-16 UTF stands for UCS Transformation Format, and UCS itself means Universal Character Set. The number refers to how many bits are used to represent a simple character, either 8(one byte) or 16(two bytes). UTF-8 is the default for documents without encoding information. Syntax: UTF-8 encoding: UTF-16 encoding: Example:
1.11 Validation Validation is a process by which an XML document is validated. An XML document is said to be valid if its content matches with the elements, attributes and other piece of an associated document type declaration(DTD) and if the document complies with the constraints expressed in it.
Validation is dealt in two ways by the XML parser,
Well-formed XML document Valid XML document
1.11.1 Well-formed XML document An XML document is said to be well-formed if it adheres to the following rules:
NON DTD XML files must use the predefined character entities for amp(&), apos(single quote), gt(>), lt(<), quot(double quote). It must follow the ordering of the tag. i.e. Inner tag must be closed before the outer tag is closed. Opening tags must have a closing tag or have themselves self ending.(
Example: ]>
Above example is said to be well-formed as: It defines the type of document. Here document type is element type. It includes a root element named as address. Every child elements are enclosed within the self explanatory tags. Here child elements are name, company and phone. Order of tags is maintained.
1.11.2 Valid XML document If an XML document is well-formed and has an associated Document Type Declaration (DTD), then it is said to be a valid XML document. We will study more about DTD in the XML - DTDs.
1.12 DTD XML Document Type Declaration commonly known as DTD is a way to describe precisely the XML language. DTDs check the validity of structure and vocabulary of XML documents against the grammatical rules of the appropriate XML language. An XML DTD can be specified internally inside the document and it can be kept in a separate document and then liked separately.
Syntax: In the above syntax A DTD starts with
1.12.1 Internal DTD A DTD is referred to as an internal DTD if elements are declared within the XML files. To reference it as internal DTD, standalone attribute in XML declaration must be set to yes; which means it will work on its own and does not depend on external source. Syntax:
where root-element is the name of root element and element-declarations is where you declare the elements.
Example: ]>
Let us go through the above code: Start Declaration - Begin with the XML declaration with the following statement DTD - Immediately following the XML header is the document type declaration, commonly referred to as the DOCTYPE.
The DOCTYPE informs the parser that a DTD is associated with this XML document. The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. DTD Body - This is where you declare elements, attributes, entities, and notations.
Several elements are declared here that make up the vocabulary of the
End Declaration - Finally, the declaration section of the DTD is closed using a closing bracket and a closing angle bracket (]>). This effectively ends the definition, and the XML document immediately follows.
1.12.2 External DTD In external DTD elements are declared outside the XML file. They are accessed by specifying the system attributes which may be either the legal .dtd file or a valid URL. Syntax: where file-name is the file with .dtd extension. Example:
1.13 Schemas XML Schema commonly known as an XML Schema Definition (XSD), and it is used to describe and validate the structure and the content of XML data. XML schemas defines the elements, attributes and data type. Schema element supports the namespaces. It is similar to what a database schema describes the data that can be contained in a database.
Syntax: You simply have to declare the following XML Schema namespace and assign a prefix such as xs in your XML document: <"http://www.w3.org/2015/XMLSchema">
Example:
Now let us use this in type in our example as below:
Note: An attribute in XSD provides extra information within an element. Attributes have name and type property as shown below:
1.14 Tree Structure An XML document is always descriptive. Tree structure often referred to as XML Tree and plays an important role to understand any XML documents easily. The tree structure contains root (parent) elements, child elements and so on. By using tree structure you can observe which branch belongs to which root and which sub-branch belongs to which branch. The search or parsing starts at the root, then move down the first branch to an element, take the first branch from there, and so on to the leaves. Example:
Following tree structure represents the above XML document:
As can be seen in the above diagram, we have a root element as
1.15 DOM The Document Object Model (DOM) is the foundation of Extensible Markup Language, or XML. XML documents have a hierarchy of informational units called nodes; DOM is a way of describing those nodes and the relationships between them. DOM Document is a collection of nodes, or pieces of information, organized in a hierarchy. This hierarchy allows a developer to navigate around the tree looking for specific information. Because it is based on a hierarchy of information, the DOM is said to be tree based. The XML DOM, on the other hand, also provides an API that allows a developer to add, edit, move, or remove nodes at any point on the tree in order to create an application. Example: The following example (sample.html) parses an XML document ("address.xml") into an XML DOM object and then extracts some information from it with a JavaScript:
DOM example
Company:
Phone:
Apples | Bananas |
Title | Artist |
---|---|
1.17 Parser XML Parser is a software library or a package that provides methods (or interfaces) for client applications to work with XML documents. It checks for proper format of the XML document and may also validate the XML documents. Modern day browsers have built-in XML parsers. Following diagram shows how an XML parser interacts with XML document:
To ease the process of parsing, some commercial products are available that facilitate the breakdown of XML document and yield more reliable results. Some commonly used parsers are listed below:
MSXML (Microsoft Core XML Services): This is Microsoft’s standard set of XML tools including a parser. System.Xml.XmlDocument : This class is part of Microsoft’s .NET library, which contains a number of different classes related to working with XML. Java built-in parser : The Java library has its own parser that you can replace the built-in parser with an external implementation such as Xerces from Apache or Saxon. Saxon : Saxon’s offerings contain tools for parsing, transforming, and querying XML. Xerces : Xerces is implemented in Java and is developed by the famous and open source Apache Software Foundation.
The following code fragment parses an XML document into an XML DOM object: xmlhttp=new XMLHttpRequest(); xmlhttp.open("GET","books.xml",false); xmlhttp.send(); xmlDoc=xmlhttp.responseXML;
The following code fragment parses an XML string into an XML DOM object:
txt="
if (window.DOMParser) {parser=new DOMParser(); xmlDoc=parser.parseFromString(txt,"text/xml"); } else // Internet Explorer { xmlDoc=new ActiveXObject("Microsoft.XMLDOM"); xmlDoc.async=false; xmlDoc.loadXML(txt); }
1.18 XLink XLink defines methods for creating links within XML documents. XLink is used to create hyperlinks within XML documents. Below is a simple example of how to use XLink to create links in an XML document:
To get access to the XLink features we must declare the XLink namespace. The XLink namespace is: "http://www.w3.org/xlink". The xlink:type and the xlink:href attributes in the
XLink Example:
Example explained:
The XLink namespace is declared at the top of the document (xmlns:xlink="http://www.w3.org/xlink") The xlink:type="simple" creates a simple "HTML-like" link The xlink:href attribute specifies the URL to link to (in this case - an image) The xlink:show="new" specifies that the link should open in a new window
1.19 XPointer XPointer allows the links to point to specific parts of an XML document. XPointer uses XPath expressions to navigate in the XML document.
XPointer Example:
In this example, we will use XPointer in conjunction with XLink to point to a specific part of another document.
Note that the XML document above uses id attributes on each element!
So, instead of linking to the entire document (as with XLink), XPointer allows you to link to specific parts of the document. To link to a specific part of a page, add a number sign (#) and an XPointer expression after the URL in the xlink:href attribute, like this: xlink:href="http://dog.com/dogbreeds.xml#xpointer(id('Rottweiler'))". The expression refers to the element in the target document, with the id value of "Rottweiler".
XPointer also allows a shorthand method for linking to an element with an id. You can use the value of the id directly, like this: xlink:href="http://dog.com/dogbreeds.xml#Rottweiler".
The following XML document contains links to more information of the dog breed for each of my dogs:
Anton is my favorite dog. He has won a lot of.....
1.20 XML in HTML The following example shows how to display XML Data in an HTML Table. In this, we open an XML file called "cd_catalog.xml". We then loop through each
Example:
1.21 XML on the Server XML files are plain text files just like HTML files. XML can easily be stored and generated by a standard web server.
Generating XML with ASP XML can be generated on a server without any installed XML software.To generate an XML response from the server - simply write the following code and save it as an ASP file on the web server: <% response.ContentType="text/xml" response.Write("") response.Write("
Generating XML with PHP To generate an XML response from the server using PHP, use following code: "; echo "
7.22.3 Generating XML From a Database XML can be generated from a database without any installed XML software.To generate an XML database response from the server, simply write the following code and save it as an ASP file on the web server: <% response.ContentType = "text/xml" set conn=Server.CreateObject("ADODB.Connection") conn.provider="Microsoft.Jet.OLEDB.4.0;" conn.open server.mappath("/db/database.mdb") sql="select fname,lname from tblGuestBook" set rs=Conn.Execute(sql) response.write("") response.write("
See the real life database output from the ASP file above.This ASP transforms an XML file to XHTML on the server:
<% 'Load XML set xml = Server.CreateObject("Microsoft.XMLDOM") xml.async = false xml.load(Server.MapPath("simple.xml"))
'Load XSL set xsl = Server.CreateObject("Microsoft.XMLDOM") xsl.async = false xsl.load(Server.MapPath("simple.xsl"))
'Transform file Response.Write(xml.transformNode(xsl)) %> Example explained The first block of code creates an instance of the Microsoft XML parser (XMLDOM), and loads the XML file into memory.The second block of code creates another instance of the parser and loads the XSL file into memory.The last line of code transforms the XML document using the XSL document, and sends the result as XHTML to your browser. This ASP example creates a simple XML document and saves it on the server: <% text="