CIS 680 XML Pointer Language (Xpointer) Xpointer, the XML

CIS 680 XML Pointer Language (XPointer) XPointer, the XML Pointer Language, defines an addressing scheme for individual parts of an XML document. These addresses can be used by any application that needs to identify parts of or locations in an XML document. For instance, an XML editor could use an XPointer to identify the current position of the insertion point or the range of the selection. An XInclude processor can use an XPointer to determine what part of a document to include. And the URI in an XLink can include an XPointer fragment identifier that locates one particular element in the targeted document. XPointers use the same XPath syntax that you're familiar with from XSL transformations to identify the parts of the document they point to, along with a few additional pieces. For the moment, therefore, an XPointer can be used as an index into a complete document, the whole of which is loaded and then positioned at the location identified by the XPointer, and even this much is more than most browsers can handle. In the long-term, extensions to XML, XLink, HTTP, and other protocols may allow more sophisticated uses of XPointers. For instance, XInclude will let you quote a remote document by using an XPointer to tell browsers where to copy the quote in the original document, rather than retyping the text of the quote. You could include cross-references inside a document that automatically update themselves as the document is revised. These uses, however, will have to wait for the development of several next-generation technologies. For now, you must be content with precisely identifying the part of a document you want to jump to when following an XLink. Points Selecting a particular element or node is almost always good enough for pointing into well-formed XML documents. However, on occasion you may need to point into XML data in which large chunks of non-XML text is embedded via CDATA sections, comments, processing instructions, or some other means. In these cases, you may need to refer to particular ranges of text in the document that don't map onto any particular markup element. Or, you may need to point into non-XML substructure in the text content of particular elements; for example the month in a BORN element that looks like this: <BORN>11 Feb 1858</BORN> An XPath expression can identify an element node, an attribute node, a text node, a comment node, or a processing instruction node. However, it can't indicate the first two characters of the BORN element (the date) or the substring of text between the first space and the last space in the BORN element (the month). XPointer generalizes XPath to allow identifiers like this. An XPointer can address points in the document and ranges between points. These may not correspond to any one node. For instance, the place between the X and the P in the word XPointer at the beginning of this paragraph is a point. The place between the t and the h in the word this at the end of the first sentence of this paragraph is another point. The text fragment "Pointer generalizes XPath to allow pointers like t" between those two points is a range. Every point is either between two nodes or between two characters in the parsed character data of a document. To make sense of this, you have to remember that parsed character data is part of a text node. For instance, consider this very simple but well-formed XML document: <GREETING> Hello </GREETING> There are exactly 3 nodes and 14 distinct points in this document. The nodes are the root node, which contains the GREETING element node, which contains a text node. In order the points are: • 1. The point before the root node • 2. The point before the GREETING element node • 3. The point before the text node containing the text "Hello" (as well as any white space) • 4. The point before the line break between <GREETING> and Hello • 5. The point before the first H in Hello • 6. The point between the H and the e in Hello • 7. The point between the e and the l in Hello • 8. The point between the l and the l in Hello • 9. The point between the l and the o in Hello • 10. The point after the o in Hello • 11. The point after the line break between Hello and </GREETING> • 12. The point after the text node node containing the text "Hello" • 13. The point after the GREETING element • 14. The point after the root node Points allow XPointers to indicate arbitrary positions in the parsed character data of a document. They do not, however, enable pointing at a position in the middle of a tag . In essence, what points add is the ability to break up the text content into smaller nodes, one for each character. A point is selected by using the string-range() function to select a range, then using the start- point () or end-point () function to extract the first or last point from the range. For example, this XPointer selects the point immediately before the D in Domeniquette Celeste Baudean© sNAME element: xpointer(start-point(string-range (id('p1')/NAME,"Domeniquette"))) This XPointer selects the point after the last e in Domeniquette : xpointer(end-point(string-range(id('p1')/NAME,"Domeniquette"))) You can also take the start-point() or end-point() of an element, text, comment, processing instruction, or root node to get the first or last point in that node. Ranges Some applications need to specify a range across a document rather than a particular point in the document. For instance, the selection a user makes with a mouse is not necessarily going to match up with any one element or node. It may start in the middle of one paragraph, extend across a heading and a picture, and then into the middle of another paragraph two pages down. Any contiguous area of a document can be described with a range. A range begins at one point and continues until another point. The start and end points are each identified by a location path. If the CIS 680 XPointer page 2 starting path points to a node set rather than a point, then range-to() will return multiple ranges, one starting from the first point of ecah node in the set. To specify a range, you append /range-to(end-point) to a location path specifying the start point of the range. The parentheses contain a location path specifying the end point of the range. For example, suppose you want to select everything between the first <PERSON> start tag and the last </PERSON> end tag in the family tree document. This XPointer accomplishes that: xpointer(/child::FAMILYTREE/child::PERSON[position()= 1]/range-to(/child::FAMILYTREE/child::PERSON[position()=last()])) Range functions XPointer includes several functions specifically for working with ranges. Most of these operate on location sets. A location set is just a node set that can also contain points and ranges, as well as nodes. The range(location-set) function returns a location set containing one range for each location in the argument. The range is the minimum range necessary to cover the entire location. In essence, this function converts locations to ranges. The range-inside(location-set) function returns a location set containing the interiors of each of the locations in the input. That is, if one of the locations is an element, then the location returned is the content of the element (but not including the start and end tags). However, if the input location is a range or point, then the interior of the location is just the same as the range or point. The start-point(location-set) function returns a location set that contains the first point of each location in the input location set. For example, start-point(//PERSON[1]) returns the point immediately after the first <PERSON> start tag in the document. start-point(//PERSON) returns the set of points immediately after each <PERSON> start tag. The end-point(location-set) function acts the same as start-point() except that it returns the points immediately after each location in its input. String ranges XPointer provides some very basic string-matching capabilities through the string-range() function. This function takes as an argument a location set to search and a substring to search for. It returns a location set containing one range for each nonoverlapping matching substring. You can also provide optional index and length arguments indicating how many characters after the match the range should start and how many characters after the start the range should continue. The basic syntax is: string-range(location-set, substring, index, length) The first argument is an XPath expression that returns a location set specifying which part of the document to search for a matching string. The second substring argument is the actual string to search for. By default, the range returned starts before the first matched character and encompasses all the matched characters. However, the index argument can give a positive number to start after the beginning of the match. For instance, setting it to 2 indicates that the range starts with the second character after the first matched character. The length argument can specify how many characters to include in the range. CIS 680 XPointer page 3 A string range points to an occurrence of a specified string, or a substring of a given string in the text (not markup) of the document. For example, this XPointer finds all occurrences of the string Harold: xpointer(string-range(/,"Harold")) You can change the first argument to specify what nodes you want to look in.

CIS 680 XML Pointer Language (Xpointer) Xpointer, the XML

Toward the Discovery and Extraction of Money Laundering Evidence from Arbitrary Data Formats Using Combinatory Reductions

O'reilly Xpath and Xpointer.Pdf

Annotea: an Open RDF Infrastructure for Shared Web Annotations

Advanced XHTML Plug-In for Iserver

Dynamic and Interactive R Graphics for the Web: the Gridsvg Package

Bibliography of Erik Wilde

Interactive Topographic Web Mapping Using Scalable Vector Graphics

Extended Link Visualization with DHTML: the Web As an Open Hypermedia System

SMIL 2.0 — Interactive Multimedia on the Web Lloyd Rutledge

XML Projects in Japan and Fujitsu's Approach to Xlink/Xpointer

What Are... Xlink and Xpointer?

Web Engineering § Easy to Author for Non-Computer-Experts