The XML Logo (from
the        XML FAQ)

Richer Anonymous Content Types

Rick Jelliffe
Academia Sinica Computing Centre
Taipei, Taiwan

1999-06-06

In SGML terminology in IS 8879:1986, the keywords EMPTY and ANY are not content models but declared content types. SGML also provides CDATA and RCDATA declared content types.

XML allows elements and attributes to be added willy nilly, within the boundaries of well-formedness. To cope with this, the first XML Schema draft has an idea of an open and closed content model: the closed content model acts like an XML content model; the open content model allows any other elements, and so acts like a promiscuous global inclusion.

There are other kinds of declared content types that may be useful for XML Schemas and in particular to cope with generic wrapper elements. The ones suggested here are:

SINGLE

This declared content type allows the element to contain one only child element, and no data content.

PAIR

This declared content type allows the element to contain two only child elements of different types, and no data content.

PAIRS

This declared content type allows the element to contain multiples of two child elements of two different types, in the pattern (a, b)+, and no data content.

SAME

This declared content type allows the element to contain zero or more of one element type only, and no data content.

LEAF

This declared content type allows the element to contain only either a single element that has no subelement or #PCDATA data content.

LEAVES

This declared content type allows the element to contain elements that have no subelements and #PCDATA data content.

UNIQUE

This declared content type allows the element to contain any number of elements, with no data content. No element type may be repeated within the element.

NONRECURSIVE

This declared content type allows the element to contain any content, except that it cannot directly contain itself.

It should be noted that the first three declared content types could also in part be implemented by a new token in content models "#ANY", which allowed any element type excluding the allowed successor element types (for determinism). The first six could also probably be modelled using architectural forms or XML Schema architectures, but in a cumbersome way.

All of these declared content types act anonymously: it does not matter the specific names involved. This adds a great richness to the kinds of schemas that can be represented.

From the vantage of extended validation, these declared content types are useful because they can allow more "well-formedness" validation against a node-list: refer to Validate This! Content Models on Different Targets at http://www.ascc.net/xml/en/utf-8/OtherValid.html for more details. WebDav s4.5 also has a requirement that properties are not duplicated.

From the vantage point of using the validator to perform well-formedness checking, the UNIQ declared content type seems to fill a needed gap. It could be used in some kind of ID validation, as a content model on all ID attributes. It can also be used to express the rule that an attribute cannot have more than one attribute with the same name. Refer to the note previously cited and to XXX: XML Notation Processors at http://www.ascc.net/xml/en/utf-8/xxxmodel.html for more details.


Copyright (C) 1999 Rick Jelliffe. Please feel free to publish this in any way you like, but try to update it to the most recent version, and keep my name on it.