XML: THE EXTENSIBLE MARKUP LANGUAGE

Rick Jelliffe
Academia Sinica Computing Center

XML is has just reached its first birthday as a World Wide Web Consortium recommended technology. It has now been adopted as a key technology by Adobe, IBM, Microsoft, Netscape, Oracle, Sun and most major vendors. (If you are using recent releases of Navigator, Internet Explorer, or RealPlayer, you have probably used XML without knowing it.)

The reason for XML's success is simple:

So that is what XML is: a standard notation for serialized data on the World Wide Web, which takes care of the most basic issues which serialized data has:

XML has been successful because, in fact, it is the only standard notation for serialized data! The only serious alternative is ISO's ASN.1 (Abstract Syntax Notation 1); but ASN.1 is a binary format, while XML is a text-based format. So you can create and edit XML documents using simple text editors, which are found on every computer.

XML is a subset of SGML (ISO's Standard Generalized Markup Language, IS 8879:1986 with 1997 WebSGML Annexes J, K, and L). SGML is very complicated, because it provides many mechanisms to support different syntaxes, and because it was developed to solve the problems of producing and archiving very large sets of interlinked, complex, technical documents. XML just takes the document structuring core of SGML.

XML has been designed to be easy to implement. There are currently almost 20 publically available parsers in many different languages available: if you need to send data over the WWW, or to write out data and configuration files to your file system, you should consider using XML. One great advantage is that you get for free many tools that can help you examine or edit your data: there are many tools which read an XML document and present it in a tree view, allowing you to edit elements as required.

Furthermore, there are now many XML-based markup languages available for many different uses:

Example

Lets have a look at a simple example. It is a recipe for some food; the recipe can be sent over the Internet to various applications:

<?xml version="1.0" encoding="Big5"?>
<recipe xmlns:DC="http://purl.oclc.org/dc" >
    <DC:TITLE>Sweet Gecko Soup</DC:TITLE>
    <DC:AUTHOR>Rick Jelliffe</DC:AUTHOR>
    <step robot="oven.set(200)">
        Preheat oven to 220 degrees</step>
    <step robot="bowl.put(sugar.amount(1 kg))">
        Put 1 kilo sugar in bowl</step>
    <step robot="bowl.put(water.amount(1 l));bowl.stir(1 minute)" id="s3">
        Stir in 1 liter of hot water for a minute</step>
    <step robot="bowl.add(gecko.amount(1 unit))">
        Add 1 gecko</step>
    <step repeat="10" backto="s3">
        Repeat the last two steps 10 times</step>
    <step robot="bowl.serve; glass.put(redwin); glass.serve">
        Serve with red wine</step>
</recipe>

The important thing to realize is that all the element and attribute names are determined by the user: recipe, DC:TITLE, DC:AUTHOR, step, robot, repeat. XML just provides the syntax to allow the structure to be represented and to give names and attributes. XML, therefore is very simple: it is a basic layer which does not attempt to do much. However, like other layered technologies such as coaxial cable, ethernet, IP or TCP, it can be used to build very complex and powerful systems.

For more examples of XML, see the Chinese XML Now! website, at http://www.ascc.net/xml/


Copyright (C) 1999 Rick Jelliffe. Please feel free to publish this in any way you like, but try to update it to the most recent version, and keep my name on it.