XML and Content Management
Total Page:16
File Type:pdf, Size:1020Kb
XML and Content Management Lecture 4: More on XML Schema and patterns in schema design Maciej Ogrodniczuk MIMUW, 22 October 2012 Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 1 XML Schema: last week summary XML Schema is a formalism for defining document schemata. It has (at least what we know so far): XML syntax, types: assigned to elements and attributes, set of predefined types, means for defining own types, more flexible definition of content models: occurrences, order in mixed content, constraints on uniqueness and references, ... Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 2 Schema-document binding The binding is composed of 3 elements: declaration of the namespace for XML Schema-compatible document instance: xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance", binding the schema for elements not in any namespace — by specifying the schema URL in xsi:noNamespaceSchemaLocation attribute, binding the list of namespaces with URLs of schemata used for validation of elements which names are belong to namespaces used in the document — in xsi:schemaLocation attribute. Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 3 Schema-document binding <?xml version="1.0"?> <text xmlns:xsi="http://www.w3.org/2001/ XMLSchema-instance" xsi:noNamespaceSchemaLocation="text.xsd" xsi:schemaLocation="http://www.example.org/equations equations.xsd http://www.example.org/graphs graphs.xsd"> ... </text> Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 4 Validation model Multilevel validation: 1 check (recursively) whether schema documents are valid (according to a schema for XML Schema), 2 check whether the document satisfies the requirements specified in the schema. Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 5 Target namespace in a schema document If names of elements, attributes and types defined in the schema document should belong to a certain namespace, it must be specified in targetNamespace attribute of <xsd:schema> (root) element. Without this attribute names of resulting components will not belong to any namespace. Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 6 targetNamespace="http://www.example.org/test" xmlns="http://www.example.org/test" Target namespace in a schema document OK: <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" > <xsd:complexType name="baseType"> ... </xsd:complexType> <xsd:complexType name="derivedType"> <xsd:complexContent> <xsd:restriction base="baseType"> ... </xsd:restriction> </xsd:complexContent> </xsd:complexType> <xsd:element name="element" type="derivedType"> </xsd:schema> Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 6 targetNamespace="http://www.example.org/test" xmlns="http://www.example.org/test" Target namespace in a schema document Not enough! <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/test" > <xsd:complexType name="baseType"> ... </xsd:complexType> <xsd:complexType name="derivedType"> <xsd:complexContent> <xsd:restriction base="baseType"> ... </xsd:restriction> </xsd:complexContent> </xsd:complexType> <xsd:element name="element" type="derivedType"> </xsd:schema> Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 6 Target namespace in a schema document OK: <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/test" xmlns="http://www.example.org/test"> <xsd:complexType name="baseType"> ... </xsd:complexType> <xsd:complexType name="derivedType"> <xsd:complexContent> <xsd:restriction base="baseType"> ... </xsd:restriction> </xsd:complexContent> </xsd:complexType> <xsd:element name="element" type="derivedType"> </xsd:schema> Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 6 Target namespace in a schema document OK: <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org/test" xmlns:types="http://www.example.org/test"> <xsd:complexType name="baseType"> ... </xsd:complexType> <xsd:complexType name="derivedType"> <xsd:complexContent> <xsd:restriction base="types:baseType"> ... </xsd:restriction> </xsd:complexContent> </xsd:complexType> <xsd:element name="element" type="types:derivedType"> </xsd:schema> Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 6 Document schemata and schema documents Schema (a logical structure) can be represented in many schema documents (.xsd files). XML Schema specification defined 3 methods of schema composition: include, import, redefine, Locations of documents describing the schema are defined in the instance and: the processing tool may use schema documents from predefined locations, schema document locations may be specified from the command line. Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 7 Schema decomposition using <xsd:include> include links the schema document with the target namespace of the main schema document. Included document must have identical target namespace as the main document or have no target namespace. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.example.org/pcs" targetNamespace="http://www.example.org/pcs"> <xsd:include schemaLocation="products.xsd"/> ... </xsd:schema> Note: included schemata do not have to be complete: bad – we need to take care of dependencies between schemata, good – we need to parameterize schemata (e.g. by defining different types for elements with different names). Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 8 <xsd:import>: real life example <xsd:import namespace="http://www.w3.org/1999/xhtml" schemaLocation="http://www.w3.org/2002/08/ xhtml/xhtml1-strict.xsd"/> <xsd:complexType name="XHTMLCodeType"> <xsd:sequence> <xsd:any namespace="http://www.w3.org/1999/xhtml" processContents="skip"/> </xsd:sequence> </xsd:complexType> <xsd:element name="xhtml" type="XHTMLCodeType"> <doc xsi:noNamespaceSchemaLocation="test2.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:html="http://www.w3.org/1999/xhtml"> <html:body> ... </html:body> </doc> Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 10 More on types Types — different classifications: Simple and complex types, atomic and multivalue types (lists and unions), ur-types, primitive types and derived types: ur-types — anyType and anySimpleType, primitive types — string, decimal, ... derived types — normalizedString, integer, ... built-in types and user-derived types, named and anonymous types. Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 11 Complex type with a simple content Problem: Define an element with text content and an attribute. Solution: <xsd:element name="amount-in-words"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="value" type="xsd:positiveInteger"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element> Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 12 Derivation by extension In two ways: by extending simple content – adding attributes to a simple type or a complex type with simple content, by extending complex content – adding new elements or attributes to the base type. Two notes: values of the base type do not have to be proper values of the derived type (extension can e.g. add required elements or attributes), when we extend complex content, the content model of the base type does not have to be repeated — the processor will add a new model after the content model of the base type, as if both of them were in a sequence. Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 13 Extending complex types <xsd:complexType name="publicationType"> <xsd:sequence> <xsd:element name="title" maxOccurs="unbounded"/> <xsd:element name="author" maxOccurs="unbounded"/> <xsd:element name="pubYear" type="xsd:year"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="bookType"> <xsd:complexContent> <xsd:extension base="publicationType"> <xsd:sequence> <xsd:element name="ISBN"/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 14 Derivation by restriction Derivation can be achieved: for simple content — using facets, for complex content — by: constraining occurrence (minOccurs, maxOccurs), removing optional elements in sequence and all groups, selecting a subset of elements in choice groups, restricting subelement types. Restricting attributes means: restriction of attribute type, constraining attribute occurrence (from optional to required or prohibited), adding, modifying or removing a default value, adding a fixed value. Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 15 Restriction of simple content and attributes: an example <xsd:complexType name="timePeriodType"> <xsd:simpleContent> <xsd:extension base="xsd:positiveInteger"> <xsd:attribute name="unit"/> <xsd:attribute name="miliseconds" type="xsd:positiveInteger"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 16 Restriction