Alphabet Soup: The XML Document
Overview The first tutorial in this series introduced the core Extensible Markup Language (XML) technologies. This tutorial covers construction of a well-formed XML document. An XML document contains data marked up using tags that define the meaning of the data. In this tutorial, we use Microsoft Internet Explorer and Crimson Editor, a freeware text editor, to display and edit XML documents.
Well-formed Data As a minimum, XML enforces the well-formed data requirement. This requires that all data nest between opening tags and closing tags, the tags are identical except the closing tag has a slash character (/) as a prefix, and the tags balance.
Well-formed XML Data: Open Internet Explorer. In Internet Explorer, open WellFormed.xml (click File à Open à Browse, navigate to the location where you stored the XML files, change Files of type to All Files, select WellFormed.xml, click Open, and click Ok). Internet Explorer displays the well-formed file as shown in Figure 1.
Figure 1: WellFormed.xml
1 Each XML document may have at most one root element. Because of this, WellFormed.xml has a Names root element; the first line shows the
closing tag. Notice that Internet Explorer shows all tags and data. Click the minus sign (-) beside the
Figure 2: Collapsed WellFormed.xml
As shown in Figure 1, each individual name element rests between a
closing tag. You have probably noticed that most of the names do not make much sense. Do not worry about this yet, we will address the actual contents of the elements in the next tutorial.
Problems with XML Data: XML data are not well formed if there are any problems with the tags used or the tags do not balance. When an XML-enabled application, like Internet Explorer, encounters a problem with XML data, the application displays an error message. Open NotWellFormed.xml in Internet Explorer. Internet Explorer displays an error message similar to Figure 3.
2 Figure 3: XML Error 1
On line three, the closing tag starts with a lowercase letter but the opening tag starts with an uppercase letter. Leave Internet Explorer open and open NotWellFormed.xml in Crimson Editor. Correct the case of the closing tag (Figure 4). Save the file.
Figure 4: Corrected Closing Tag on Line 3
Refresh the page in Internet Explorer (Figure 5). Internet Explorer detects another error and displays an error message (Figure 5).
3 Figure 5: XML Error 2
In this case, Internet Explorer reports that the closing tag and
Editor, add a closing tag on line 4 () as shown in Figure 6. Save the file.
Figure 6: Line 4 Correction
Refresh the page in Internet Explorer. Unfortunately, Internet Explorer reports the same error as in Figure 5. This means a problem with tags balancing still exists. Working backwards starting at line 4, verify that each element has a valid opening tag and closing tag. Lines 3 and 4 are ok;
4 however, the closing tag in line 2 does not have a slash. Add the slash to the closing tag (Figure 7). Save the file.
Figure 7: Slash in Closing Tag
Refresh the page in Internet Explorer. At this point, the document is well formed (Figure 8).
Figure 8: Corrected XML Document
The problems encountered illustrate several rules related to XML documents:
1. XML is case sensitive – uppercase and lowercase characters are different. 2. Opening and closing tags must be the same, except the closing tag starts with a slash (/). 3. Tags must balance – every opening tag must have a closing tag.
5 XML Hierarchy An XML document conforms to a hierarchical structure similar to an inverted tree. The root element (or root entity) is the top of the tree and represents the contents of the XML document as a whole (Figure 9). Each element below the root element is either a simple element or a complex element. A simple element represents a single fact. A complex element represents multiple facts clustered together (e.g., an address), multiple occurrences of the same type of information (e.g., family members), or some combination of these.
Figure 9: XML Document Hierarchical Structure
Root Element
Element 1 Element 2 Element n
Element 1.1 Element 1.2 Element 1.n
Element Element Element 1.n.1 1.n.2 1.n.n
Figure 10 illustrates the differences between a simple element and a complex element, using a “bank account” structure. In this structure, the account id and balance represent facts about a bank account. A single bank account may have multiple account holders, each with a name and tax id. Based on this structure, a “bank account” is a complex element since it breaks down into a set of elements. The “account id” and “balance” elements are simple elements since they represent individual facts. An “account holder” is a complex element since it contains two
6 simple elements, “holder name” and “holder tax id.” The collection of “account holders” is a complex element since it may consist of multiple occurrences of “account holder.”
Figure 10: Initial Document Hierarchy Bank Account
Account Account Balance ID Holders
Account Account Holder (1) Holder (3)
Holder Holder Account Holder Holder Name Tax ID Holder (2) Name Tax ID Holder Holder Name Tax ID
Figure 11 shows sample data in the document hierarchy. Each complex element that breaks down into another level of the hierarchy forms a node. As shown in the sample, a node may contain lower-level nodes. For example, the “account holders” node contains multiple nodes, one for each account holder. The illustration shows three child nodes for the “account holders” parent node; however, nothing defined so far specifies this limit. In the next tutorial, we describe a way to limit the number of child nodes.
7 Figure 11: Sample Data in the Document Hierarchy Bank Account
90210222 Account 23234079 Holders
Account Account Holder (1) Holder (3) Account H. Simpson 24512423 Holder (2) N. Flanders 11234129
M. Szyslak 53445231
Open bankacct1.xml in Internet Explorer (Figure 12). Internet Explorer includes a minus sign at each level of the hierarchy so that you can collapse and expand the hierarchy as desired.
Figure 12: BankAcct1.xml in Internet Explorer
8 The plus sign beside the second
Open bankacct1.xml in Crimson Editor. Make the following changes. Change the account id to
902102. Change the balance to 2323.40. Save the changes. Refresh the page in Internet Explorer. The changes should display as entered.
Delete the second account holder’s information (M. Szyslak). Be sure to delete the entire node as indicated in Figure 13. Save the changes. Refresh the page in Internet Explorer. The page should look similar to Figure 14.
Figure 13: Deleting the M. Szyslak Node
9 Figure 14: Updated BankAcct1.xml
Switch back to Crimson Editor and delete lines 10 (
Flanders) and 11 (
10 Figure 15: XML Document with Empty Node
Internet Explorer displays the empty node as
Summary This tutorial described the primary components of an XML document. The next tutorial introduces XML schema definitions.
XML Resources
Crimson Editor, www.crimsoneditor.com. Microsoft Internet Explorer, www.microsoft.com.
11