Chinese Tables

Report to CJK DOCP Group,
Taipei, Taiwan, January 1999.

Rick Jelliffe,
Academia Sinica Computing Centre.

Many tables found in CJK documents made before 1994 cannot be marked up using Western SGML element sets for tables, and cannot be typeset using Western typesetting software. In previous CJK meetings (Alaska, Sydney, Taipei in absentia) I have looked at the structure found in some interesting Japanese tables.

In this paper, I suggest the pattern underlying some interesting Chinese tables. I give a DTD for CJK tables, using this model.

Japanese Tables

First, here is a restatement of the structures observed in Japanese technical tables:

  1. 3D tables. These are tables in which a heading cell is divided diagonally, to accommodate both a row heading and a column heading. See "Chinese Tables" below for more details.
  2. Exploded tables. When a cell or region contains too much information, it may be exploded: an arrow may come out of the cells and point to another table neighbouring table. The neighbouring table may overlap, or be joined to the current table in some way.
  3. Irregular boundaries. Like the Western periodic table, Japanese tables are sometimes better regarded as "communities of cells" which have some relationship between each other, rather than as logical categories created by the headings.
  4. As a further case of this "community of cells" view, some Japanese tables exhibit a diagramatic tendency. These use a graphical language which seems to be implicitly understood by the Japanes I have asked, but is not in the graphic training of Westerners to expect. This language is that (for a table laid out top-to-bottom, left-to-right, Western-style) rows indicate parallelism, and columns indicate sequence. A cell split horizontally similarly indicates parallelism, a cell split vertically indicates sequence, and a cell split diagonally indicates choice (See 3D Table).

Chinese Tables

In the Chinese tables, I suggest the following trends are at work:

Graphic Structures

Logical Structures


I could find few examples of these kinds of tables in recent publications. This is undoubtedly because Western Word Processors are being used, and tables are being typed rather than drawn. Yet these kinds of tables suggest that there is a rich graphical and structural tradition that is in danger of being lost.

The emphasis on fitting into a given area is less critical, but not less desirable, for online screen presentation, as much as for fixed-area paper.

CJK Tables

We can propose a generalized form of table, building in the ideas of "community of cells" and "labelling arbitrary regions". This structure brings out the important fact that Western table models are inherently only two levels of nesting (cells in rows) with the addition of an extra layer for headings. Within tables, further hierarchical information is regarded as "a list" or "an embedded table". We are used to arbitrary depth lists, perhaps we should also become used to arbitrary-depth tables.

        <!ELEMENT community     ( group* )>
        <!ELEMENT group         ( cell * )>
                <!ATTLIST group
                        id              ID #REQUIRED 
                        belongs-to      IDREFS #IMPLIED 
                        html:class      CDATA #IMPLIED
                        kind            ( sequence | parallel | alternative ) 
                                                "alternative" >
        <!ELEMENT cell          ANY  >
                <!ATTLIST cell
                        id              ID #REQUIRED 
                        belongs-to      IDREFS #IMPLIED 
                        html:class      CDATA #IMPLIED >
        <group id="variables" html:class="heading">
                <cell id="h1">fish</cell>
                <cell id="h2">ocean</cell>0
        <group belongs-to="h1" id="fish-head" html:class="sub-heading">
                <cell id="salmon">salmon</cell>
                <cell id="tuna">tuna</cell>
        <group belongs-to="h2" id="ocean-head" html:class="sub-heading">
                <cell id="pacific">Pacific</cell>
                <cell id="indian">Indian</cell>
        <group id="p1">
                <cell belongs-to="salmon pacific" id="sp">delicious</cell>
                <cell belongs-to="salmon indian" id="si">not found</cell.
                <cell belongs-to="tuna pacific" id="tp">delicious</cell>
                <group belongs-to="tuna indian" kind="sequence" id="ti" >
                        <cell id="tiw" belongs-to="summer">metal-taste</cell>
                        <cell id="tis" belongs-to="winter">delicious</cell>
        <group id="seasons" html:class="sub-heading">
                <cell id="summer">Summer</cell>
                <cell id="winter">Winter</cell>

The layout engine would have to figure out the appropriate layout: spanning, exploding, 3D headering, popups, etc. However, the sequence of elements in the instance provides strong hints for default renderings.

For example, in the table above, a style-sheet using CJK formatting could render the "variables" in a 3D corner heading. The headings for the seasons could be popup descriptions.


In my book The XML & SGML Cookbook, I tried to find the basic patterns underlying various document structures. In my view, there has not been systematic or academic attempts to discover these underlying patterns. Of course, there have been attempts in specific areas (e.g. RDF, and perhaps style sheet structures), but I do not think there is an awareness that these patterns exist, and that they would form a useful basis for document processing systems.

(This is not to say that there have been no treatments of tables. A recent paper presented at ??? by ??? in ??? (Springer Verlag, 1998) analyses tables, but only with the graphical presupposion that a table is made from rows and columns with headers--the paper has no discussion that a table is anything more than this. I take this to be because the Western tables being viewed had nothing to challenge this view.)

The architectural form movement also does not address this issue. They are concerned with linking corresponding "architectures" rather than finding fundamental patterns.

In the particular area of tables, there have traditionally been two groups: the "data modellers" who want all data marked up outside the table, and linked into a presentation structure, and the "2-dimensionalists", who want to make the best table format they can, in the constraints of rows and columns.

The community-of-cells pattern I suggest here falls into neither group. So I don't have much hope that it can find much support. (On the other hand, the availability of Java plug-ins may give us a new age of experimental DTDs.)

I have been working on a further generalization of this pattern. Instead of "belongs-to", there are two relations possible: "superior" and "inferior". The "superior" brings in all kinds of headings, abstracts and so on. The "inferior" brings in cells, glosses and out-of-line annotations.

Open Questions

Graphical Extrapolation

A speculation: perhaps the 3D header cell attempts to be a graphical summary of the table:
  | row
c | d
0 |   a
l |     t
u |       a
m |
n |

in which the lines are then rubber-banded to fit the layout of the cells.

Copyright (C) 1999 Rick Jelliffe. Please feel free to publish this in any way you like, but try to update it to the most recent version, and keep my name on it.