XML and Content Management
Lecture 4: More on XML Schema and patterns in schema design
Maciej Ogrodniczuk
MIMUW, 22 October 2012
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 1 XML Schema: last week summary
XML Schema is a formalism for defining document schemata.
It has (at least what we know so far): XML syntax, types: assigned to elements and attributes, set of predefined types, means for defining own types, more flexible definition of content models: occurrences, order in mixed content, constraints on uniqueness and references, ...
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 2 Schema-document binding
The binding is composed of 3 elements: declaration of the namespace for XML Schema-compatible document instance: xmlns:xsi= "http://www.w3.org/2001/XMLSchema-instance", binding the schema for elements not in any namespace — by specifying the schema URL in xsi:noNamespaceSchemaLocation attribute, binding the list of namespaces with URLs of schemata used for validation of elements which names are belong to namespaces used in the document — in xsi:schemaLocation attribute.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 3 Schema-document binding
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 4 Validation model
Multilevel validation:
1 check (recursively) whether schema documents are valid (according to a schema for XML Schema), 2 check whether the document satisfies the requirements specified in the schema.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 5 Target namespace in a schema document
If names of elements, attributes and types defined in the schema document should belong to a certain namespace, it must be specified in targetNamespace attribute of
Without this attribute names of resulting components will not belong to any namespace.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 6 targetNamespace="http://www.example.org/test"
xmlns="http://www.example.org/test"
Target namespace in a schema document
OK: >
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 6 targetNamespace="http://www.example.org/test" xmlns="http://www.example.org/test"
Target namespace in a schema document
Not enough!
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 6 Target namespace in a schema document
OK:
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 6 Target namespace in a schema document
OK:
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 6 Document schemata and schema documents
Schema (a logical structure) can be represented in many schema documents (.xsd files). XML Schema specification defined 3 methods of schema composition: include, import, redefine,
Locations of documents describing the schema are defined in the instance and: the processing tool may use schema documents from predefined locations, schema document locations may be specified from the command line.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 7 Schema decomposition using
include links the schema document with the target namespace of the main schema document. Included document must have identical target namespace as the main document or have no target namespace.
Note: included schemata do not have to be complete: bad – we need to take care of dependencies between schemata, good – we need to parameterize schemata (e.g. by defining different types for elements with different names).
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 8
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 10 More on types
Types — different classifications: Simple and complex types, atomic and multivalue types (lists and unions), ur-types, primitive types and derived types: ur-types — anyType and anySimpleType, primitive types — string, decimal, ... derived types — normalizedString, integer, ... built-in types and user-derived types, named and anonymous types.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 11 Complex type with a simple content
Problem: Define an element with text content and an attribute.
Solution:
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 12 Derivation by extension
In two ways: by extending simple content – adding attributes to a simple type or a complex type with simple content, by extending complex content – adding new elements or attributes to the base type.
Two notes: values of the base type do not have to be proper values of the derived type (extension can e.g. add required elements or attributes), when we extend complex content, the content model of the base type does not have to be repeated — the processor will add a new model after the content model of the base type, as if both of them were in a sequence.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 13 Extending complex types
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 14 Derivation by restriction
Derivation can be achieved: for simple content — using facets, for complex content — by: constraining occurrence (minOccurs, maxOccurs), removing optional elements in sequence and all groups, selecting a subset of elements in choice groups, restricting subelement types.
Restricting attributes means: restriction of attribute type, constraining attribute occurrence (from optional to required or prohibited), adding, modifying or removing a default value, adding a fixed value.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 15 Restriction of simple content and attributes: an example
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 16 Restriction of simple content and attributes: an example
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 16 Restriction of simple content and attributes: an example
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 16 Restricting complex content: an example
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 18 Restriction and extension at the same time
Let’s define
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 19 Benefits of deriving types
Building documents can be more flexible with the type hierarchy: when the base type is assigned to an element in the schema, and the derived type is used in the document.
Using derived type requires pointing at it explicitly in xsi:type attribute coming from the document instance namespace (http://www.w3.org/2001/XMLSchema-instance). Note: base types can be explicitly marked in the schema as abstract (setting the abstract attribute value to true) which implies using a derived type in the document (and pointing to it with a xsd:type attribute).
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 20 Using derived types: an example
The schema:
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 21 Using derived types: an example
The schema:
The document:
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 21 Good common practice of schema design
1 readability: schemata which are easy to understand are easier to maintain and they have higher chances to be often used and reused, 2 precision: properly constructed types can eliminate errors in data (important for exchanging data with applications out of our control), 3 reuse: usually means savings and better design, „less means more”, 4 flexibility and extensibility: helps satisfying various user requirements and further schema reuse, supports change management.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 22 10 concrete hints
By Priscilla Walmsley:
1 which names to use? 2 specific or general names of properties? 3 how to resolve name conflicts? 4 xsd:string, xsd:normalizedString or xsd:token? 5 grouping elements? 6 nil values, 7 lists of values, 8 namespaces, 9 global or local declaration of elements? 10 named or anonymous types?
http://www.datypic.com/services/xmldesign/XMLDesignColor.pdf
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 23 Names of schema components
Names should be: meaningful and easy to remember, not too much magical or abbrieviated (PTNM), handy (toyShopConstructionSiteAddress), used consistently (ZipCode, zip-code, zip code, zip.code, zipCode), with standardized vocabulary (zipCode, postCode, postalCode, kodPocztowy), prefixed or suffixed by types and groups.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 24 Specific or general names of properties?
Specific names:
General names:
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 25 Name conflicts
Is such definition correct?
General rule: elements and attributes can have the same names, types can have the same names as elements or attributes, two types cannot have the same name.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 26 xsd:string, xsd:normalizedString or xsd:token?
xsd:string when whitespace formatting is important, for long texts — using mixed content can be also considered.
xsd:normalizedString when whitespace formatting is not important, but positions of characters are,
xsd:token best for short texts, particularly those restricted by enumeration or pattern.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 27 Grouping elements
Without a grouping element:
With a grouping element:
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 28 Nil values
Nil values represent undefined content of an XML element. Usage:
1 Possibility of carrying a nil value is assigned to an element in a schema by marking it with nillable="true" attribute. 2 Nillable elements can be marked with a special xsi:nil attribute (from the namespace for document instance — http://www.w3.org/2001/XMLSchema-instance) with true value, showing that the value is undefined.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 29 Nil values: an example
Schema:
Usage in the document:
Notes: Element with nil value must be empty. Nillability is stronger than defined content model. Attributes of a nil element must follow the model in all respects.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 30 Good practices: methods of representation of a nil value
Empty value of an attribute can be represented in one way only: use="optional". For elements there are many more ways, e.g.:
1 no element, 2 empty element, 3 nil value.
Usage:
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 31 Good practices: methods of representation of a nil value
Example:
Note: empty string (with zero length) can be represented with an empty element or no element, undefined string can be represented with nil value or no element, but when we define a structure when occurrence of elements matters (e.g. due to its constant size), lack of element is a bad representation.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 31 Nil values
Using nil values: does not impair the definition by allowing empty content, allows unambiguous definition that the information does not exist, allows passing information about indeterminacy without removing the element from content (presence of the element may be used by the application), allows turning off default values.
Nil values cannot be used for non-string values, unless... some trick is applied:
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 32 Lists of values
An example:
Challenges: 1 potentially frequent changes, often out of control of the schema designer (language, currency, country codes) → worth defining in separate schema documents which allows versioning, 2 length of lists slowing down validation, littering the schema, hindering schema management → worth limiting list length to 15-20 values, document the lists, use patterns, 3 inextensibility of the simple types → xsd:token or its union with enumeration can be used to let other documents narrow the list.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 33 Extensible list of values: an example
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 34 Namespaces
General ideas: namespaces should be declared in root element, default namespace should be used only when it dominates the document, prefixes do not matter, but conventions should be applied (e.g. xsd or xs for schemata...)
3 ways of using namespaces in the schema:
1 target namespace as default namespace (most frequent solution), 2 XML Schema namespace as default, 3 no default namespace.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 35 Default target namespace
Notation:
Notes: XML Schema components are always prefixed, references to components defined in the schema without the prefix (apart from XPaths in
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 36 BUT: target namespace necessary to reference defined components!
Default XML Schema namespace
Notation:
Notes: XML Schema components are not prefixed, references to components defined in the schema are prefixed.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 37 Default XML Schema namespace
Notation:
Notes: XML Schema components are not prefixed, references to components defined in the schema are prefixed. BUT: target namespace necessary to reference defined components!
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 37 No default namespace
Notation:
Notes: every reference with a prefix (apart from new names), best solution while having many namespaces.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 38 Global or local element declarations?
Global declarations:
Local declarations:
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 39 Global or local element declarations?
Schema definition does not name the allowed root elements – any element can become root. When just selected root elements are allowed, only possible roots should stay global (and all other elements local). To reuse local elements, they can be grouped:
instead of
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 39 Named or anonymous types?
Named types:
Anonymous types:
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 40 3 methods of schema documentation
Using:
1 a dedicated element
or in non-schema-related ways: describing the schema outside it, storing the schema together with descriptions of components, allowed transformations and styles – in any format, ...
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 41
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 42
Both elements can carry an optional source attribute documenting the source of information.
Lecture 4: More on XML Schema and patterns in schema design XML and Content Management 43 What is really
Can be used for: storing additional validation rules (e.g. for Schematron), mapping to other technologies (e.g. relational schema), mapping to forms (e.g. XHTML