XML Documents

Alphabet Soup: The XML Document Overview The first tutorial in this series introduced the core Extensible Markup Language (XML) technologies. This tutorial covers construction of a well-formed XML document. An XML document contains data marked up using tags that define the meaning of the data. In this tutorial, we use Microsoft Internet Explorer and Crimson Editor, a freeware text editor, to display and edit XML documents. Well-formed Data As a minimum, XML enforces the well-formed data requirement. This requires that all data nest between opening tags and closing tags, the tags are identical except the closing tag has a slash character (/) as a prefix, and the tags balance. Well-formed XML Data: Open Internet Explorer. In Internet Explorer, open WellFormed.xml (click File à Open à Browse, navigate to the location where you stored the XML files, change Files of type to All Files, select WellFormed.xml, click Open, and click Ok). Internet Explorer displays the well-formed file as shown in Figure 1. Figure 1: WellFormed.xml 1 Each XML document may have at most one root element. Because of this, WellFormed.xml has a Names root element; the first line shows the <Names> opening tag and the last line shows the </Names> closing tag. Notice that Internet Explorer shows all tags and data. Click the minus sign (-) beside the <Names> opening tag. Internet Explorer collapses the selected element as shown in Figure 2. Click the plus sign (+) to expand the element so that the document looks like Figure 1 again. Figure 2: Collapsed WellFormed.xml As shown in Figure 1, each individual name element rests between a <Name> opening tag and </Name> closing tag. You have probably noticed that most of the names do not make much sense. Do not worry about this yet, we will address the actual contents of the elements in the next tutorial. Problems with XML Data: XML data are not well formed if there are any problems with the tags used or the tags do not balance. When an XML-enabled application, like Internet Explorer, encounters a problem with XML data, the application displays an error message. Open NotWellFormed.xml in Internet Explorer. Internet Explorer displays an error message similar to Figure 3. 2 Figure 3: XML Error 1 On line three, the closing tag starts with a lowercase letter but the opening tag starts with an uppercase letter. Leave Internet Explorer open and open NotWellFormed.xml in Crimson Editor. Correct the case of the closing tag (Figure 4). Save the file. Figure 4: Corrected Closing Tag on Line 3 Refresh the page in Internet Explorer (Figure 5). Internet Explorer detects another error and displays an error message (Figure 5). 3 Figure 5: XML Error 2 In this case, Internet Explorer reports that the </Names> closing tag and <Name> opening tag do not balance. To track down the source of this error, start with the line indicated in Figure 5 and work backwards. Notice that the element on line 4 does not have a closing tag. In Crimson Editor, add a closing tag on line 4 (</Name>) as shown in Figure 6. Save the file. Figure 6: Line 4 Correction Refresh the page in Internet Explorer. Unfortunately, Internet Explorer reports the same error as in Figure 5. This means a problem with tags balancing still exists. Working backwards starting at line 4, verify that each element has a valid opening tag and closing tag. Lines 3 and 4 are ok; 4 however, the closing tag in line 2 does not have a slash. Add the slash to the closing tag (Figure 7). Save the file. Figure 7: Slash in Closing Tag Refresh the page in Internet Explorer. At this point, the document is well formed (Figure 8). Figure 8: Corrected XML Document The problems encountered illustrate several rules related to XML documents: 1. XML is case sensitive – uppercase and lowercase characters are different. 2. Opening and closing tags must be the same, except the closing tag starts with a slash (/). 3. Tags must balance – every opening tag must have a closing tag. 5 XML Hierarchy An XML document conforms to a hierarchical structure similar to an inverted tree. The root element (or root entity) is the top of the tree and represents the contents of the XML document as a whole (Figure 9). Each element below the root element is either a simple element or a complex element. A simple element represents a single fact. A complex element represents multiple facts clustered together (e.g., an address), multiple occurrences of the same type of information (e.g., family members), or some combination of these. Figure 9: XML Document Hierarchical Structure Root Element Element 1 Element 2 Element n Element 1.1 Element 1.2 Element 1.n Element Element Element 1.n.1 1.n.2 1.n.n Figure 10 illustrates the differences between a simple element and a complex element, using a “bank account” structure. In this structure, the account id and balance represent facts about a bank account. A single bank account may have multiple account holders, each with a name and tax id. Based on this structure, a “bank account” is a complex element since it breaks down into a set of elements. The “account id” and “balance” elements are simple elements since they represent individual facts. An “account holder” is a complex element since it contains two 6 simple elements, “holder name” and “holder tax id.” The collection of “account holders” is a complex element since it may consist of multiple occurrences of “account holder.” Figure 10: Initial Document Hierarchy Bank Account Account Account Balance ID Holders Account Account Holder (1) Holder (3) Holder Holder Account Holder Holder Name Tax ID Holder (2) Name Tax ID Holder Holder Name Tax ID Figure 11 shows sample data in the document hierarchy. Each complex element that breaks down into another level of the hierarchy forms a node. As shown in the sample, a node may contain lower-level nodes. For example, the “account holders” node contains multiple nodes, one for each account holder. The illustration shows three child nodes for the “account holders” parent node; however, nothing defined so far specifies this limit. In the next tutorial, we describe a way to limit the number of child nodes. 7 Figure 11: Sample Data in the Document Hierarchy Bank Account 90210222 Account 23234079 Holders Account Account Holder (1) Holder (3) Account H. Simpson 24512423 Holder (2) N. Flanders 11234129 M. Szyslak 53445231 Open bankacct1.xml in Internet Explorer (Figure 12). Internet Explorer includes a minus sign at each level of the hierarchy so that you can collapse and expand the hierarchy as desired. Figure 12: BankAcct1.xml in Internet Explorer 8 The plus sign beside the second <AccountHolder> tag indicates the node is collapsed. Open bankacct1.xml in Crimson Editor. Make the following changes. Change the account id to 902102. Change the balance to 2323.40. Save the changes. Refresh the page in Internet Explorer. The changes should display as entered. Delete the second account holder’s information (M. Szyslak). Be sure to delete the entire node as indicated in Figure 13. Save the changes. Refresh the page in Internet Explorer. The page should look similar to Figure 14. Figure 13: Deleting the M. Szyslak Node 9 Figure 14: Updated BankAcct1.xml Switch back to Crimson Editor and delete lines 10 (<HolderName>N. Flanders</HolderName>) and 11 (<HolderTaxID>11234129</HolderTaxID>). This creates an empty account holder node. Save the changes and refresh the page in Internet Explorer. Figure 15 depicts the XML document with these changes. 10 Figure 15: XML Document with Empty Node Internet Explorer displays the empty node as <AccountHolder />. This is XML shorthand for an opening tag followed immediately by a closing tag (<AccountHolder></AccountHolder>). You can use this shorthand notation as a placeholder in an XML document when you do not have a value to assign to a node yet. Summary This tutorial described the primary components of an XML document. The next tutorial introduces XML schema definitions. XML Resources Crimson Editor, www.crimsoneditor.com. Microsoft Internet Explorer, www.microsoft.com. 11 .

XML Documents

Differential Fuzzing the Webassembly

XML a New Web Site Architecture

Document Object Model

Extensible Markup Language (XML) and Its Role in Supporting the Global Justice XML Data Model

An XML Model of CSS3 As an XƎL ATEX-TEXML-HTML5 Stylesheet

Markup Languages SGML, HTML, XML, XHTML

Swivel: Hardening Webassembly Against Spectre

Development and Maintenance of Xml-Based Versus Html-Based Websites: a Case Study

Webassembly Backgrounder

Schema Based Storage of Xml Documents in Relational Databases

Automatically Indexing Millions of Databases in Microsoft Azure SQL Database Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek R

Formally Verifying Webassembly with Kwasm