Logical Inference from Schematron Schemas

Rick Jelliffe, Allette Systems and Academia Sinica Computing Centre, 20001/02/22 draft

A schema can be reduced to a set of logical propositions. Schemas in different language can be reduced and thus combined. These proposition can be queried as a system. The queries can validate the schema does not have inconsistencies, and to give user guidance about allowed elemetns during editing or validity.

In this note the schema language could be Schematron, the reduction could be an XSL transformation, and the query system is Prolog-ish.


First, we provide some bootstrapping relationships for Trees and schemas.

child (C, P) :- parent( P, C).
ancestor(A,D) :- parent(A,D) or ancestor(A, parent(D)).
descendent(D, A) :- child(D, A) or descendent(D, child(A)).
rootElement (X) :- parent(P, _).
sibling(E1, E2) :- parent( P, E1), parent(P, E2).
previousSibling(E1, E2) :- sibling(E1, E2), ( position(E1) < position(e2)).
nextSibling(E1, E2) :- sibling(E1,E2), (position(E1) > position(E2)).
immediatePreviousSibling(E1, E2) :- sibling(E1, E2), (position(E1) = position(E2) -1).
immediateNextSibling(E1, E2) :- sibling(E1, E2), (position(E1) = position(E2) +1).
first(E1) :- prevousSibling(E1, _).
last(E1) :- nextSibling(E1, _).
allowedChild (C, P) :- allowedParent( P, C).
allowedAncestor(A,D) :- allowedParent(A,D) or allowedAncestor(A, allowedParent(D)).
allowedDescendent(D, A) :- allowedChild(D, A) or allowedDescendent(D, allowedChild(A)).
allowedRootElement (X) :- allowedParent(P, _).
allowedSibling(E1, E2) :- allowedParent( P, E1), allowedParent(P, E2).
allowedPreviousSibling(E1, E2) :- allowedPibling(E1, E2), ( allowedPosition(E1) < allowedPosition(e2)).
allowedNextSibling(E1, E2) :- allowedSibling(E1,E2), (allowedPosition(E1) > allowedPosition(E2)).
allowedImmediatePreviousSibling(E1, E2) :- allowedSibling(E1, E2), (allowedPosition(E1) = allowedPosition(E2) -1).
allowedImmediateNextSibling(E1, E2) :- allowedSibling(E1, E2), (allowedPosition(E1) = allowedPosition(E2) +1).
allowedFirst(E1, P) :- parent(P, E1), allowedPrevousSibling(E1, _).
allowedLast(E1, P) :- parent(P, E1), allowedPextSibling(E1, _).

valid(E1) :- allowedParent(E1, (parent(E1)),
	allowedChild(E1, child(e1)),
	allowedPreviousSibling(E1, previousSibling(E1)),
	allowedNextSibling(E1, nextSiblingE1)),
	allowedImmediatePreviousSibling(E1, immediatePreviousSibling(E1)),
	allowedImmediateNextSibling(E1, immediateNextSiblingE1)).

Note that these fudge details and can be improved: for example, allowedPreviousSibling refers to anywhere in a docuemnt. The could be extended to allow other properties, notable required elements.


Now we define some simple rules for mapping from content models to rules.

<!ELEMENT x ( y )>   
	 =>  allowedChild(y, x).
<!ELEMENT x ( y, z)>
	 => allowedChild(y, x). allowedChild(z, x). allowedFirst(y, x). allowedLast(z, x). allowedPreviousSibling(y, x). 
<!ELEMENT x (y | z)  
	=> allowedChild(y, x). allowedChild(z, x). allowedFirst(y, x). allowedLast(y, x).
allowedFirst(z, x). allowedLast(z, x). 

The we check a node. For each element we say what we have found.

	<x><y/><x>  =>
		child(y, x). 
		parent(y, x). first(y).

Then validate(x) and validate(y).

We can also ask questions, such as "from that y element, what can I insert next?"


The equivalent in Schematron.

	<rule context="x">
		<assert test="y" />
		=> allowedChild(y, x)
	<rule context="x">
		<assert test="y" />
		<assert text="z" />
	      => allowedChild(y, x). allowedChild(z, x). allowedSibling(y, z).
	<rule context="x">
		<assert test="y | z" />
		=> allowedChild(y, x). allowedChild(z,x). not(allowedSibling(y, z)).

Obviously there are many more relations that need to be coped with. In particular, schemas are often incomplete, so validation involves no tests being false rather than no tests being undecidable. And we need to make the distinction between an element x and every element x if we want to move to less single-element-at-a-time sessions. Because Schemas exhibit great locality effect, it is quite possible that traversing a document in document order, loading the parent/sibling/children propositions from the instance, doing the validation or making quereis, and then rolloing back the instance data and moving on, may be just as efficient as loading the complete instance as propositions.

Comments welcome to Rick Jelliffe ricko@allette.com.au