Report to CJK DOCP Group,
Taipei, Taiwan, January 1999.
Academia Sinica Computing Centre.
Many tables found in CJK documents made before 1994 cannot be marked up using Western SGML element sets for tables, and cannot be typeset using Western typesetting software. In previous CJK meetings (Alaska, Sydney, Taipei in absentia) I have looked at the structure found in some interesting Japanese tables.
In this paper, I suggest the pattern underlying some interesting Chinese tables. I give a DTD for CJK tables, using this model.
First, here is a restatement of the structures observed in Japanese technical tables:
In the Chinese tables, I suggest the following trends are at work:
I could find few examples of these kinds of tables in recent publications. This is undoubtedly because Western Word Processors are being used, and tables are being typed rather than drawn. Yet these kinds of tables suggest that there is a rich graphical and structural tradition that is in danger of being lost.
The emphasis on fitting into a given area is less critical, but not less desirable, for online screen presentation, as much as for fixed-area paper.
We can propose a generalized form of table, building in the ideas of "community of cells" and "labelling arbitrary regions". This structure brings out the important fact that Western table models are inherently only two levels of nesting (cells in rows) with the addition of an extra layer for headings. Within tables, further hierarchical information is regarded as "a list" or "an embedded table". We are used to arbitrary depth lists, perhaps we should also become used to arbitrary-depth tables.
<!ELEMENT community ( group* )> <!ELEMENT group ( cell * )> <!ATTLIST group id ID #REQUIRED belongs-to IDREFS #IMPLIED html:class CDATA #IMPLIED kind ( sequence | parallel | alternative ) "alternative" > <!ELEMENT cell ANY > <!ATTLIST cell id ID #REQUIRED belongs-to IDREFS #IMPLIED html:class CDATA #IMPLIED > e.g. <community> <group id="variables" html:class="heading"> <cell id="h1">fish</cell> <cell id="h2">ocean</cell>0 </group> <group belongs-to="h1" id="fish-head" html:class="sub-heading"> <cell id="salmon">salmon</cell> <cell id="tuna">tuna</cell> </group> <group belongs-to="h2" id="ocean-head" html:class="sub-heading"> <cell id="pacific">Pacific</cell> <cell id="indian">Indian</cell> </group> <group id="p1"> <cell belongs-to="salmon pacific" id="sp">delicious</cell> <cell belongs-to="salmon indian" id="si">not found</cell. <cell belongs-to="tuna pacific" id="tp">delicious</cell> <group belongs-to="tuna indian" kind="sequence" id="ti" > <cell id="tiw" belongs-to="summer">metal-taste</cell> <cell id="tis" belongs-to="winter">delicious</cell> <cell> <group> <group id="seasons" html:class="sub-heading"> <cell id="summer">Summer</cell> <cell id="winter">Winter</cell> <group> <community>
The layout engine would have to figure out the appropriate layout: spanning, exploding, 3D headering, popups, etc. However, the sequence of elements in the instance provides strong hints for default renderings.
For example, in the table above, a style-sheet using CJK formatting could render the "variables" in a 3D corner heading. The headings for the seasons could be popup descriptions.
In my book The XML & SGML Cookbook, I tried to find the basic patterns underlying various document structures. In my view, there has not been systematic or academic attempts to discover these underlying patterns. Of course, there have been attempts in specific areas (e.g. RDF, and perhaps style sheet structures), but I do not think there is an awareness that these patterns exist, and that they would form a useful basis for document processing systems.
(This is not to say that there have been no treatments of tables. A recent paper presented at ??? by ??? in ??? (Springer Verlag, 1998) analyses tables, but only with the graphical presupposion that a table is made from rows and columns with headers--the paper has no discussion that a table is anything more than this. I take this to be because the Western tables being viewed had nothing to challenge this view.)
The architectural form movement also does not address this issue. They are concerned with linking corresponding "architectures" rather than finding fundamental patterns.
In the particular area of tables, there have traditionally been two groups: the "data modellers" who want all data marked up outside the table, and linked into a presentation structure, and the "2-dimensionalists", who want to make the best table format they can, in the constraints of rows and columns.
The community-of-cells pattern I suggest here falls into neither group. So I don't have much hope that it can find much support. (On the other hand, the availability of Java plug-ins may give us a new age of experimental DTDs.)
I have been working on a further generalization of this pattern. Instead of "belongs-to", there are two relations possible: "superior" and "inferior". The "superior" brings in all kinds of headings, abstracts and so on. The "inferior" brings in cells, glosses and out-of-line annotations.
| row ----------------- c | d 0 | a l | t u | a m | n |
in which the lines are then rubber-banded to fit the layout of the cells.
Copyright (C) 1999 Rick Jelliffe. Please feel free to publish this in any way you like, but try to update it to the most recent version, and keep my name on it.