<<
Home , XML

Alphabet Soup: The XML Document

Overview The first tutorial in this series introduced the core Extensible (XML) technologies. This tutorial covers construction of well-formed XML document. An XML document contains marked up using tags that define the meaning of the data. In this tutorial, we use Explorer and Crimson Editor, a freeware text editor, to display and edit XML documents.

Well-formed Data As a minimum, XML enforces the well-formed data requirement. This requires that all data nest between opening tags and closing tags, the tags are identical except the closing tag has a character (/) as a prefix, and the tags balance.

Well-formed XML Data: Open . In Internet Explorer, open WellFormed.xml (click File à Open à Browse, navigate to the location where you stored the XML files, change Files of type to All Files, select WellFormed.xml, click Open, and click Ok). Internet Explorer displays the well-formed file as shown in Figure 1.

Figure 1: WellFormed.xml

1 Each XML document may have at most one root element. Because of this, WellFormed.xml has a Names root element; the first line shows the opening tag and the last line shows the

closing tag. Notice that Internet Explorer shows all tags and data. Click the minus sign (-) beside the opening tag. Internet Explorer collapses the selected element as shown in Figure 2. Click the plus sign (+) to expand the element so that the document looks like Figure 1 again.

Figure 2: Collapsed WellFormed.xml

As shown in Figure 1, each individual name element rests between a opening tag and

closing tag. You have probably noticed that most of the names do not make much sense. Do not worry about this yet, we will address the actual contents of the elements in the next tutorial.

Problems with XML Data: XML data are not well formed if there are any problems with the tags used or the tags do not balance. When an XML-enabled application, like Internet Explorer, encounters a problem with XML data, the application displays an error message. Open NotWellFormed.xml in Internet Explorer. Internet Explorer displays an error message similar to Figure 3.

2 Figure 3: XML Error 1

On line three, the closing tag starts with a lowercase letter but the opening tag starts with an uppercase letter. Leave Internet Explorer open and open NotWellFormed.xml in Crimson Editor. Correct the case of the closing tag (Figure 4). Save the file.

Figure 4: Corrected Closing Tag on Line 3

Refresh the page in Internet Explorer (Figure 5). Internet Explorer detects another error and displays an error message (Figure 5).

3 Figure 5: XML Error 2

In this case, Internet Explorer reports that the closing tag and opening tag do not balance. To track down the source of this error, start with the line indicated in Figure 5 and work backwards. Notice that the element on line 4 does not have a closing tag. In Crimson

Editor, add a closing tag on line 4 () as shown in Figure 6. Save the file.

Figure 6: Line 4 Correction

Refresh the page in Internet Explorer. Unfortunately, Internet Explorer reports the same error as in Figure 5. This means a problem with tags balancing still exists. Working backwards starting at line 4, verify that each element has a valid opening tag and closing tag. Lines 3 and 4 are ok;

4 however, the closing tag in line 2 does not have a slash. Add the slash to the closing tag (Figure 7). Save the file.

Figure 7: Slash in Closing Tag

Refresh the page in Internet Explorer. At this point, the document is well formed (Figure 8).

Figure 8: Corrected XML Document

The problems encountered illustrate several rules related to XML documents:

1. XML is case sensitive – uppercase and lowercase characters are different. 2. Opening and closing tags must be the same, except the closing tag starts with a slash (/). 3. Tags must balance – every opening tag must have a closing tag.

5 XML Hierarchy An XML document conforms to a hierarchical structure similar to an inverted . The root element (or root entity) is the top of the tree and represents the contents of the XML document as a whole (Figure 9). Each element below the root element is either a simple element or a complex element. A simple element represents a single fact. A complex element represents multiple facts clustered together (e.g., an address), multiple occurrences of the same type of information (e.g., family members), or some combination of these.

Figure 9: XML Document Hierarchical Structure

Root Element

Element 1 Element 2 Element n

Element 1.1 Element 1.2 Element 1.n

Element Element Element 1.n.1 1.n.2 1.n.n

Figure 10 illustrates the differences between a simple element and a complex element, using a “bank account” structure. In this structure, the account id and balance represent facts about a bank account. A single bank account may have multiple account holders, each with a name and tax id. Based on this structure, a “bank account” is a complex element since it breaks down into a set of elements. The “account id” and “balance” elements are simple elements since they represent individual facts. An “account holder” is a complex element since it contains two

6 simple elements, “holder name” and “holder tax id.” The collection of “account holders” is a complex element since it may consist of multiple occurrences of “account holder.”

Figure 10: Initial Document Hierarchy Bank Account

Account Account Balance ID Holders

Account Account Holder (1) Holder (3)

Holder Holder Account Holder Holder Name Tax ID Holder (2) Name Tax ID Holder Holder Name Tax ID

Figure 11 shows sample data in the document hierarchy. Each complex element that breaks down into another level of the hierarchy forms a . As shown in the sample, a node may contain lower-level nodes. For example, the “account holders” node contains multiple nodes, one for each account holder. The illustration shows three child nodes for the “account holders” parent node; however, nothing defined so far specifies this limit. In the next tutorial, we describe a way to limit the number of child nodes.

7 Figure 11: Sample Data in the Document Hierarchy Bank Account

90210222 Account 23234079 Holders

Account Account Holder (1) Holder (3) Account H. Simpson 24512423 Holder (2) N. Flanders 11234129

M. Szyslak 53445231

Open bankacct1.xml in Internet Explorer (Figure 12). Internet Explorer includes a minus sign at each level of the hierarchy so that you can collapse and expand the hierarchy as desired.

Figure 12: BankAcct1.xml in Internet Explorer

8 The plus sign beside the second tag indicates the node is collapsed.

Open bankacct1.xml in Crimson Editor. Make the following changes. Change the account id to

902102. Change the balance to 2323.40. Save the changes. Refresh the page in Internet Explorer. The changes should display as entered.

Delete the second account holder’s information (M. Szyslak). Be sure to delete the entire node as indicated in Figure 13. Save the changes. Refresh the page in Internet Explorer. The page should look similar to Figure 14.

Figure 13: Deleting the M. Szyslak Node

9 Figure 14: Updated BankAcct1.xml

Switch back to Crimson Editor and delete lines 10 (N.

Flanders) and 11 (11234129). This creates an empty account holder node. Save the changes and refresh the page in Internet Explorer. Figure 15 depicts the XML document with these changes.

10 Figure 15: XML Document with Empty Node

Internet Explorer displays the empty node as . This is XML shorthand for an opening tag followed immediately by a closing tag (). You can use this shorthand notation as a placeholder in an XML document when you do not have a value to assign to a node yet.

Summary This tutorial described the primary components of an XML document. The next tutorial introduces XML schema definitions.

XML Resources

Crimson Editor, www.crimsoneditor.com. Microsoft Internet Explorer, www.microsoft.com.

11