<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="qaml.css" ?>
<faq >
  <head>
     <title>
      Chinese Metadata FAQ
    </title>
    <maintain><name>Rick Jelliffe</name>
        <email>ricko@gate.sinica.edu.tw</email>
        </maintain>
    <hdr>
       <type>Programming</type>
       <content>Metadata, Dublin Core, CIMI, RDF, EAD, 
       PICS, MARC,  Z39.50,
       XML, SGML, Chinese, FAQ</content>
    </hdr>
    <archive href="http://xml.ascc.net/xml/en/utf-8/faq.xml"/>
  </head>
  <body  xml:lang="en">
  <section id="intro">
    <logo href="../../graphics/xml.gif" 
         alt="The XML Logo (from the XML FAQ)" />
    <title>
      Chinese Metadata F.A.Q.
    </title>
    <p>
      This FAQ is about using XML-based metadata (e.g., Dublin
      Core, RDF and EAD), including for Chinese-language metadata.
    </p>
    <p>
      The FAQ is created to help answer question arising in the
      Digital Library/Museum project, at Academia Sinica, Taiwan.
    </p>
    </section>
    <section id="sect-a">
    <title>
      A. General
    </title>
  
<qna id='a1'><q class='h2' >
      A.1. What is Metadata?
    </q><author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author> <a gist="Data about data" >
    <p>
      Metadata is "<span class='i' >data about data</span>"; it is data for the
      purposes of cataloging, searching, archiving, electronic
      discovery, displaying, and so on. The key indication of the
      direction of the WWW on metadata comes from the inventor of
      the WWW, Tim Berners-Lee (in Metadata Architecture, at <link
      href="http://www.w3.org/DesignIssues/Metadata.html">
      http://www.w3.org/DesignIssues/Metadata.html</link >)
      "<span class='i' >Metadata is machine understandable information about web
      resources or other things</span>" but "<span class='i' >metadata is
      data.</span>"
    </p>    </a>
    </qna>
    </section>
    <section id="sect-b">
    <title>
      B. WWW Metadata "Standards"
    </title>
    <p>
      For a good review of Metadata standards, see "<span class='i' >Review of
      Metadata Formats</span>" by Rachel Heery, 1996 (at <link href= 
      "http://www.oasis-open.org/cover/heery-review.html">
      http://www.oasis-open.org/cover/heery-review.html</link > )
    </p>   
<qna id='b1'><q class='h2' >
      B.1. What is the Dublin Core (DC)?
    </q><author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author> <a gist="Metadata for WWWW resources">
    <p>
      The Dublin Core home page (<link href= 
      "http://purl.oclc.org/dc/index.htm">http://purl.oclc.org/dc/index.htm</link >)
      says <span class='i' >"The Dublin Core is a metadata element set intended
      to facilitate discovery of electronic resources. Originally
      conceived for author-generated description of Web resources,
      it has attracted the attention of formal resource description
      communities such as museums, libraries, government agencies,
      and commercial organizations."</span>
    </p>
    <p>
      At the moment, there are many different formats used to
      catalog information. Dublin Core provides a very simple
      subset (or classification) of them. In the future, we
      anticipate all catalogs will support access using Dublin Core
      metadata. Dublin Core defines 15 core element names.
    </p>
    <p>
      For an example of Dublin Core metadata, see the end of this
      file.
    </p>
    <p>
      The Dublin Core is not finalized. It is still being
      developed. All Dublin Core implementations are therefore, to
      a certain extent, experimental. Please also beware that many
      of the "draft" Dublin Core material is itself based on other
      drafts--so quite a lot of the information is unreliable.
    </p>   </a>
    </qna>
<qna id='b2'><q class='h2' >
      B.2. What is Qualified Dublin Core (QDC)?
    </q> <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author><a gist="More elaborate metadata for WWW resources">
    <p>
      The simple Dublin Core elements are too simple for many
      users. They have started to develop subclasses of elements,
      based on the Dublin Core. So now there is "simple" Dublin
      Core, and "qualified" Dublin Core.
    </p>
    <p>
      Note: The Qualified Dublin Core elements are not finalized. They
      are still being developed. All Dublin Core implementations
      are therefore, to a certain extent, experimental.
    </p>  </a>
    </qna>
<qna id='b3'><q class='h2' >
      B.3. Can I use Dublin Core with XML or HTML?
    </q> <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author><a gist="Yes" >
    <p>
      Yes. Dublin Core is merely a set of field names and
      descriptions. It is not a file format (or <span class='i' >notation</span>.)
      However, you can use implement Dublin Core using the
      following notations:
    </p>
    <div class="ul">
      <p class='li'>
        HTML (HyperText Markup Language-- <link href= 
        "http://www.w3.org/MarkUp/">http://www.w3.org/MarkUp/</link >),
        the markup language of the World Wide Web; See also
        Miller's article <link href= 
        "http://www.oasis-open.org/cover/miller-issues.html">
        http://www.oasis-open.org/cover/miller-issues.html</link >
      </p>
      <p class='li'>
        SGML (Standard Generalized Markup Language-- <link href= 
        "http://www.oasis-open.org/cover/sgml-xml.html">
        http://www.oasis-open.org/cover/sgml-xml.html</link >), the
        sophisticated ISO-standard markup language used for
        high-end publishing and archiving;
      </p>
      <p class='li'>
        XML (eXensible Markup Language-- <link href= 
        "http://www.w3.org/XML/">http://www.w3.org/XML/</link >), the
        subset of SGML optimised for simplicity and speed for Web
        use. (It seems that new versions of HTML will also use XML
        syntax: <link href="http://www.w3.org/MarkUp/">
        http://www.w3.org/TR/WD-html-in-xml/</link >.) There is a
        Chinese XML FAQ at <link href="http://www.ascc.net/xml/">
        http://www.ascc.net/xml/</link >. The best version of the XML
        Spec is Tim Bray's Annotated XML Specification, at <link href= 
        "http://www.xml.org/">http://www.xml.org/</link >
      </p>
      </div>
            </a>
    </qna>
<qna id='b4'><q class='h2' >
      B.4. Can I use Dublin Core for Chinese Metadata?
    </q>   <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>     <a gist="Yes" >
    <p>
      Yes. Dublin Core is merely a set of field names and
      descriptions. It is not a file format. XML, SGML and HTML 4
      all allow Han ideographs. There is more information on
      Chinese XML at the <span class='i' >Chinese XML FAQ</span> at <link href= 
      "http://www.ascc.net/xml/en/utf-8/faq.html">
      http://www.ascc.net/xml/en/utf-8/faq.html</link >. This FAQ will
      have specific answers to questions about Chinese metadata and
      Dublin Core too.
    </p>           </a>
    </qna>
<qna id='b5'><q class='h2' >
      B.5. What is the Resource Description Framework (RDF)?
    </q>       <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>        <a gist="Make assertions about WWW resources" >
    <p>
      The RDF home page (<link href= 
      "http://www.w3.org/RDF/Overview.html">http://www.w3.org/RDF/Overview.html</link >)
      says "<span class='i' >RDF is designed to provide an infrastructure to
      support metadata across many web-based activities.</span>" It
      is very verbose, when used to mark up elements. RDF is also
      quite complex to understand.
    </p>
    <p>
      The lesson of SGML is that there are great benefits in
      marking up documents generically, according to <span class='i' >document
      type,</span> rather than than with specific details for each
      element. This is the only way to handle large data sets, in
      many cases. Applying this lesson to RDF, it is probable that
      RDF will be most beneficial when applied to element types
      rather than to element instances: the element type becomes a
      kind of <span class='i' >macro</span> for the RDF.
    </p>
    <p>
      RDF is frequently mentioned in material about Dublin Core.
      But you can use Dublin Core without RDF.
    </p>
    <p>
      RDF provides extra value to Dublin Core metadata. Generic RDF
      tools (for example, for visualization and discovery) will be
      able to use DC+RDF data even though they do not understand
      Dublin Core elements.
    </p>                  </a>
    </qna>
<qna id='b6'><q class='h2' >
      B.6. Can I use RDF with XML or HTML?
    </q> <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author><a gist="Yes" >
    <p>
      Yes. RDF is a conceptual model to let you make assertions
      about elements of data. It is not a file format. However, you
      can implement RDF systems using XML (or SGML). You can
      implement some kinds of RDF using HTML too.
    </p> 
    <p>
      RDF is best for database-style "fielded
      records" rather than for adding to existing free text data,
      or data ordered by the requirements of publishing systems.
    </p>    
    <p>There is a DTD for RDF available at
    <link href="http://www.ascc.net/xml/en/utf-8/resource-index.html"
    >http://www.ascc.net/xml/en/utf-8/resource-index.html</link>.
    </p>
    </a>
    </qna>
<qna id='b7'><q class='h2' >
      B.7. What is the safest way to use Dublin Core and RDF?
    </q>  <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>     <a gist="Dublin Core guidlines">
    <p>
      The Dublin Core working group on Data Models has published
      some guidelines which seem safe. See <link href= 
      "http://www.mailbase.ac.uk/lists/dc-datamodel/1998-09/0029.html">
      http://www.mailbase.ac.uk/lists/dc-datamodel/1998-09/0029.html</link >.
    </p>
    <p>
      These basic guidelines can be summarized:
    </p>
    <div class="ul">
      <p class='li'>
        use (only) Dublin Core elements for all descriptions of
        external objects;
      </p>
      <p class='li'>
        never use Dublin Core elements for higher-level schema
        elements;
      </p>
      <p class='li'>
        use RDF's assertion-based data model for higher-level
        schema elements;
      </p>
      <p class='li'>
        the value of a (simple) Dublin Core element should be a
        simple string or a URI (i.e., XML's PCDATA);
      </p>
      <p class='li'>
        in the case of agents (i.e. Author, Contributor, or
        publisher), the value could also be some in-line RDF;
      </p>
      <p class='li'>
        if you use Qualified Dublin Core elements, you should also
        use RDF to connect these with the simple Dublin Core types
        (and you should use a separate XML namespace and prefix)
      </p>
      </div>
                  </a>
    </qna>
<qna id='b8'><q class='h2'  dir="ltr">
      B.8. What is the Warwick Framework?
    </q>     <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>        <a gist="Bundle metadata into containers" >
    <p>
      The Warwick Framework is a method to bundle <span class='i' >packages</span>
      (of metadata) together (into a <span class='i' >container)</span>. The
      packages can be of different formats. The Warwick Frameworks
      lets you specify the relationship between packages in the
      container; this helps transmission and queries. The
      containers can be in
    </p>
    <div class="ul">
      <p class='li'>
        SGML/XML,
      </p>
      <p class='li'>
        HTML, and
      </p>
      <p class='li'>
        MIME headers.
      </p>
      </div>
    
    <p>
      The paper defining the Warwick Framework is "The Warwick
      Framework: A Container Architecture for Aggregating Sets of
      Metadata" by Carl Lagoze, Clifford A. Lynch, Ron Daniel Jr.,
      (refer <link href= 
      "http://lists.w3.org/Archives/Public/www-disw/msg00017.html">
      http://lists.w3.org/Archives/Public/www-disw/msg00017.html</link >)
    </p>                </a>
    </qna>
<qna id='b9'><q class='h2' >
      B.9. What is the Electronic Archive Description (EAD)?
    </q> <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author><a gist="To specify finding aids for archived objects"> 
    <p>
      Electronic Archive Description (EAD) is an XML/SGML DTD for
      specifying <span class='i' >finding aids</span> for archived objects. It
      started with Library at the University of California,
      Berkeley and now is being developed by Society of American
      Archivists and (Network Development and MARC Standards Office
      of) the Library of Congress. See <link href= 
      "http://www.loc.gov/ead/ead.html">
      http://www.loc.gov/ead/ead.html</link >. It has been in use and
      testing for several years, and has been officially released
      in 1998. EAD is a traditional XML/SGML DTD, and provides very
      detailed fields for metadata.
    </p>
    <p>
      The EAD is very different from the Dublin Core and RDF.
      Perhaps we can say that Dublin Core is an incomplete
      "top-down" design, while EAD is a completed "bottom-up"
      design.
    </p>
    <div class="ul">
      <p class='li'>
        Dublin Core provides very basic fundamental types of
        catalog metadata, which can be used by generic Dublin Core
        tools;
      </p>
      <p class='li'>
        Qualified Dublin Core provides more detailed subtypes of
        these, which can be used by generic Qualified Diblin Core
        tools;
      </p>
      <p class='li'>
        RDF provides a way of formally asserting the connection of
        these types to data, which can be used by generic RDF
        tools;
      </p>
      <p class='li'>
        EAD is based on traditional SGML document analysis, where
        the intention is to provide data to feed custom-built
        EAD-specific finding tools. EAD contains only elements for
        "<span class='i' >description, control, navigation, indexing, and online
        and print presentation</span>." (See <link href= 
        "http://www.loc.gov/ead/eaddsgn.html">
        http://www.loc.gov/ead/eaddsgn.html</link >.)
      </p>
    
    </div>
    <p>
      EAD won the 1998 "<span class='i' >Coker Prize for Description</span>" from
      the Society of American Archivists. See <link href= 
      "http://www.archivists.org/awards/coker.html">
      http://www.archivists.org/awards/coker.html</link >
    </p>
    <p>
      A good site is the Berkeley Digital Library <link href= 
      "http://sunsite.berkeley.edu/SGML/index.html">
      http://sunsite.berkeley.edu/SGML/index.html</link >. An example
      of EAD in use is <link href= 
      "http://sunsite2.berkeley.edu:28008/dynaweb/oac/bampfa/@Generic__CollectionView">
      http://sunsite2.berkeley.edu:28008/dynaweb/oac/bampfa/@Generic__CollectionView</link >
    </p>   </a>
    </qna>
<qna id='b10'><q class='h2' >
      B.10. What is PICS?
    </q>   <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>  <a gist="Platform for Internet Content Selection" >
    <p>
      PICS is the <span class='i' >Platform for Internet Content Selection</span>.
      The PICS home page (<link href=
      "http://www.w3.org/PICS/">http://www.w3.org/PICS/</link >) says
      PICS "enables labels (metadata) to be associated with
      Internet content. It was originally designed to help parents
      and teachers control what children access on the Internet,
      but it also facilitates other uses for labels, including code
      signing and privacy. "
    </p>
    <p>
      There are several related specifications for PICS :
    </p>  
    <div class="ul">            
      <p class='li'>
        <link href="http://www.w3.org/TR/REC-PICS-services"><span class='i' >
        <span class='b' >Service descriptions</span></span>:</link > Specifies a
        MIME type and data format for describing a rating service.
      </p>
      <p class='li'>
        <link href="http://www.w3.org/TR/REC-PICS-labels"><span class='i' ><span class='b' >
        Label format and distribution</span></span><span class='i' >:</span></link >
        Specifies label formats for use in 
        </p>
          <p class='li2'>
            HTML,
          </p>
          <p class='li2'>
            MIME headers and
          </p>
          <p class='li2'>
            external documents
          </p>
                    
      <p class='li'>
        <link href="http://www.w3.org/TR/REC-PICSRules"><span class='i' ><span class='b' >
        PICSRules:</span></span></link > Specifies a format to allow
        filtering of data based on the PICS labels.
      </p>
      <p class='li'>
        <link href="http://www.w3.org/TR/REC-DSig-label/"><span class='b' ><span class='i' >PICS
        Signed Labels (DSig) 1.0 Specification</span></span></link ><link href= 
        "http://www.w3.org/TR/PR-DSig-label">:</link > Allows PICS
        labels to be digitally signed.
      </p>
    </div>
    <p>
      PICS is thus concerned with delivering metadata for use in
      establishing sessions and data access, not for archiving or
      data manipulation.
    </p>
    <p>
      Note that there are also some attempts to extend PICS as a
      more general Schema language. For example, to describe Dublin
      Core: <link href="http://www.dstc.edu.au/RDU/DCMD/">
      http://metadata.net/dstc/DC-10-EN/schema.txt</link >.
    </p>                </a>
    </qna>
<qna id='b11'><q class='h2' >
      B.11. What is the relation between EAD, PICS, Dublin Core,
      RDF and the Warwick Framework?
    </q> <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author> <a gist="Appropriate for different stages" >
    <p>
      EAD, Dublin Core/RDF and PICS are all serve different stages
      of document production. Lets use the terminology of the Movie
      industry (Ted Nelson has said that Hypertext is a kind of
      movie-making), but applied to electronic archives Web
      sites:--
    </p>
    <div class="ul">
      <p class='li'>
        <span class='b' >Pre-production</span>: archiving, scanning.
        marking up data and metadata
      </p>
      <p class='li'>
        <span class='b' >Production</span>: storing in database, indexing,
        running queries, etc.
      </p>
      <p class='li'>
        <span class='b' >Post-production</span>: access through a
        web-server, formatting results of database queries,
        designing and attaching appropriate look-and-feel for
        different classes of users and media, handling navigation;
      </p>
      <p class='li'>
        <span class='b' >Session</span>: authorizing logins, locating
        resources, negotiating appropriate content, maintaining
        state-information about user sessions (cookies, etc)
      </p>
        </div>    
    <p>
      EAD is therefore aimed at pre-production needs--archivists
      need to store all the relevant information that they have,
      whether or not it fits into nice Dublin Core categories or
      not. (The EAD Design Principles put it this way: "<span class='i' >The
      needs of public users, curatorial and reference staff, and
      finding aid authors were given priority in the standard's
      design</span>." Furthermore, "<span class='i' >Finding aids are not objects
      of study but rather tools leading to such objects</span>." see
      <link href="http://www.loc.gov/ead/eaddsgn.html">
      http://www.loc.gov/ead/eaddsgn.html</link >.)
    </p>
    <p>
      Dublin Core is aimed at (post-)production needs--it is a
      simple interface for allowing interchange and access, but it
      does not attempt to provide any higher-level structures: it
      does not treat the differences in objects as significant: a
      stick is the same as a library.
    </p>
    <p>
      RDF is aimed at post-production needs--providing a way to tie
      together information from lots of different schemas, which
      may have nothing to do with archiving.
    </p>
    <p>
      PICS is aimed at session needs--deciding whether a type of
      information is appropriate for a particular user or
      situation.
    </p>
    <p>
      The Warwick Framework is a format by which EAD, Dublin Core,
      RDF or PICs schemas can be bundled together and interchanged.
      (However, this functionality may be, to a certain extent,
      duplicated by RDF and perhaps EAD. I would expect RDF to win
      popular support, if they are competitors.)
    </p>
        </a>
    </qna>
<qna id='b12'><q class='h2' >
      B.12. What is MARC?
    </q> <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>        <a gist="Library metadata record format" >
    <p>
      MARC is the Big Daddy. But he has multiple personalities.
      MARC is a metadata record format for libraries.
    </p>
    <p>
      MARC records use uses the ISO 2709: 1981 format. However,
      there is recent work to also provide an XML format.
    </p>  </a>
    </qna>
<qna id='b13'><q class='h2' >
      B.13.What is ANSI Z39.50?
    </q>  <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author> <a gist="Database query protocol">
    <p>
      ANSI Z39.50 is a database query protocol, for querying a
      catalog about library holdings. It is very suitable for MARC
      data . Holding information is (typically) returned in the
      OPAC format. More recently, CIMI has defined a "profile" to
      support retrieval of museum information too. See question
      B.15 below.
    </p>
    <p>
      Z39.50 uses ISO ASN.1 (Abstract Syntax Notation 1) rather
      than SGML (See XML, question C.1.) This makes it much more
      efficient for many small transactions.
    </p>
    <p>
      For a good bibliography, see Lynch, Clifford A. (1994). <span class='i' >
      RFC 1729, Using the Z39.50 Information Retrieval Protocol in
      the Internet Environment.</span> <link href= 
      "http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1729.txt">
      http://info.internet.isi.edu:80/in-notes/</link > <link href= 
      "http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1729.txt">
      rfc/files/rfc1729.txt</link >
    </p>      </a>
    </qna>
<qna id='b14'><q class='h2' >
      B.14 What is TEI?
    </q>   <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>    <a gist="Text Encoding Initiative" >
    <p>
      The Text Encoding Initiative (TEI ) is "<span class='i' >an international
      project to develop guidelines for the preparation and
      interchange of electronic texts for scholarly research, and
      to satisfy a broad range of uses by the language industries
      more generally.</span>" TEI have made a family of DTDs for all
      sorts of electronic texts. These include the ability to have
      some metadata, in the TEI headers. A text or object may have
      multiple TEI headers.
    </p>
    <p>
      In complexity or richness, the TEI headers provide more than
      Dublin Core, but certainly less than EAD or MARC. Perhaps
      Qualified Dublin Core will be similar. TEI headers seem to be
      intended to allow suppliers of material to give a good
      headstart to catalogers, rather than being a relentless
      enumeration of every possibile bibliographic possibility.
    </p>
    <p>
      The TEI Home Page is <link href="http://www.uic.edu/orgs/tei/">
      http://www.uic.edu/orgs/tei/</link > The current specifications
      can be found at at <link href= 
      "http://etext.lib.virginia.edu/TEI.html">
      http://etext.lib.virginia.edu/TEI.html</link >. The specification
      for the header are in Chapter 5: <link href= 
      "http://etext.virginia.edu/bin/tei-tocs?div=DIV1&amp;id=HD">
      http://etext.virginia.edu/bin/tei-tocs?div=DIV1&amp;id=HD</link >
    </p>
    <p>An XML version of the TEI Lite DTD can be found at
    <link href="http://xml.ascc.net/xml/">Chinese XML Now!</link >
    site, at the Resources page.</p></a>
    </qna>
<qna id='b15'><q class='h2' >
      B.15. What is CIMI?
    </q>  <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author><a gist="Computer Interchange of Museum Information">
    <p>
      CIMI is the <span class='i' >Consortium for the Computer Interchange of
      Museum Information</span>. Their home page is <link href= 
      "http://www.cimi.org/">http://www.cimi.org/</link >
    </p>
    <p>
      CIMI'a approach is to foster standards: in particular
      "<span class='i' >SGML</span> <span class='i' >for structuring information and Z39.50 for
      information interchange</span>." In particular, CIMI uses a
      TEI-based DTD (document type definition) for defining
      structured documents. It seems that they may be also keen on
      supporting the Dublin Core, to some extent (how?).
    </p>
    <p>
      The CIMI material includes some good mechanisms to feed
      post-production or session stages: for example "wall text" to
      accompany objects on display. The main one is the "access
      point" attribute.
    </p>
    <p>
      For querying, CIMI define a profile of Z39.50. It builds on
      the Library of Congress <span class='i' >Collections Profile</span> (see <link
      href= 
      "http://lcweb.loc.gov/z3950/agency/profiles/collections.html">
      http://lcweb.loc.gov/z3950/agency/profiles/collections.html</link >
      )
    </p>
    <p>
      A history can be found at <link href= 
      "http://www.cimi.org/about/history.html">
      http://www.cimi.org/about/history.html</link >
    </p>     </a>
    </qna>
<qna id='b16'><q class='h2' >
      B.16. What is XPointer?
    </q>  <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>      <a gist="Hypertext locations">
    <p>
      XPointer is a standard mechanism for locating elements in
      document structures using various criteria (attribute values,
      absolute and relative hierarchical position, etc). It is
      based on W3C URI hyperlinks, on ISO HyTime hyperlink
      navigation models, and on TEI location syntax. It is
      currently in draft at <link href="http://www.w3.org/TR/">
      http://www.w3.org/TR/</link > An XPointer can be the data for a
      query or the result of a query.
    </p>            </a>
    </qna>
<qna id='b17'><q class='h2' >
      B.17. What is FGDC?
    </q> <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author> <a gist="Geographical metadata">
    <p>
      Federal Geographic Data Committee. 1994. Content standards
      for digital geospatial metadata (June 8). Washington, D.C. <link href= 
      "http://geology.usgs.gov/tools/metadata/standard/metadata.html">
      http://geology.usgs.gov/tools/metadata/standard/metadata.html</link >
    </p>     </a>
    </qna>
<qna id='b18'><q class='h2' >
      B.18. What is CML?
    </q>   <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>      <a ist="Chemical Markup Language">
    <p>
      CML is the Chemical Markup Language. It is an XML DTD for
      metadata for chemical documents.
    </p>            </a>
    </qna>
<qna id='b19'><q class='h2' >
      B.19. What is BSML?
    </q>   <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>            <a gist="Bioinformatic Sequence Markup Language">
    <p>
      BSML is the Bioinformatic Sequence Markup Language. It is an
      XML DTD for metadata for genetic information. It has a rich
      set of presentation elements, so it may perhap be regarded as
      a presentation DTD more than a metadata DTD. However, it
      features the ability to invoke data from many different
      formats.
    </p>                  </a>
    </qna>
    </section>
    <section id="sect-c">
    <title>C. Implementation</title>
    <qna id="c1">
      <q>
      C.1 I'm confused! Do we really need all these kinds of metadata?
      </q><author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>
      <a gist="For duplicated standards: no">
      <p>A lot of the metadata standards are duplicates. There is sometimes
      no convincing reason to select one standard compared to another. 
      And, even for big standards like MARC, you may still find you need to
      add your own element types, because of local requirements.
      </p>   </a>
      </qna>
      
      <qna id='c2'><q class='h2' >
      C.2 Can I use a CMARC Code if I really want to?
    </q><author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author><a gist="Yes" >
    <p>    
      Sure. You can add any kind of attributes you like to
      XML-based metadata. You can make up a DTD using CMARC
      codes in the element type names, or you can allow elements
      to have an attribute in which the appropriate CMARC code
      can be specified.
    </p>   </a>
    </qna>
    <qna id="c3"><q>C.3 Can I use CCCII with XML</q>
    <a gist="No">
    <p>At the moment, there is no way to use XML with CCIII
    without losing information about variants: XML must 
    use ISO 10646 (Unicode)
    as its document character set. </p>
    <p>However, XML is just one possible subset of SGML. It is 
    completely legitimate to create your own subset of SGML,
    which follows XML in everything *except* that it uses
    CCCII as the document character set. Such a markup language
    would be called "CCCII-XML": no one has done this, but several
    libraries have asked about it.</p>
    </a>
    </qna>
      <qna id="c4">
      <q>
      C.4 Do we need to choose a single metadata standard?</q> 
      <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>
        <a gist="No"><p>In many cases, different metadata standards have
      the same data, but use different structures and names for
      it. A good aproach is to use the <span class="em">most specific</span>
      DTD you can, for each different class of data.
      </p>
      <p>You can always map from specific to more general. It is 
      impossible to map from general to more specific.
      </p>
      <p>(You should <span class="em">plan to transform</span> your
      data: there are many text processing tools available to help
      you: Perl, Python, XSL (LotusXSL, XT, Koala, etc.), Cost, 
      OmniMark, Balise. And the DOM programming interface makes
      transformations available at your browser, using Java or Javascript.)</p>
           
    <p>
      This approach of using a very specific and targeted DTDs more
      closely follows that recommended in Light and Burnard's study
      <span class="em">Three SGML Metadata Formats: TEI, EAD and CIMI</span> (<link
      href="http://hosted.ukoln.ac.uk/biblink/wp1/sgml/"
      >http://hosted.ukoln.ac.uk/biblink/wp1/sgml/</link>
      ): it recommends keeping data in a specific DTD. It makes the
      interesting observation that the three DTDs could be
      combined: "<span class="i">one might use the EAD scheme to describe
      individual archival holdings down to the item level and then
      use TEI headers to describe individual documents, where these
      were deemed of sufficient importance to warrant the effort.
      Equally, one could embed CIMI topic descriptors within an
      otherwise purely TEI conformant document."</span> (in
      Conclusion, 5.3 Use of schemes in combination-- <link href= 
      "http://hosted.ukoln.ac.uk/biblink/wp1/sgml/conclusion.html">
      http://hosted.ukoln.ac.uk/biblink/wp1/sgml/conclusion.html</link>
      ) This is the approach in use at the Bodleian Library at the
      University of Oxford.
    </p>                
    <p>
      Note. The mappings between MARC and Dublin Core (and GILS)
      are available at "Crosswalk" site <link href= 
      "http://www.loc.gov/marc/dccross.html">
      http://www.loc.gov/marc/dccross.html</link>. Use these mappings
      if possible. (See also <em>Mappings between Metadata
      Formats</em> compiled by Michael Day <link href= 
      "http://www.ukoln.ac.uk/metadata/interoperability/">
      http://www.ukoln.ac.uk/metadata/interoperability/</link> )
    </p>        
      </a>
    </qna>

    </section>
    <section id="sect4">
    <title>
      D. XML Questions
    </title>

<qna id='d1'><q class='h2' >
      D.1. What is XML?
    </q>  <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author>                           <a gist="eXtensible Markup Language" >
    <p>
      XML ( <link href="http://www.w3.org/XML/">
      http://www.w3.org/XML/</link >) is a version of the ISO standard
      generalized markup language SGML. Most new WWW markup
      languages are written using XML now. See question B.3.
    </p>                                </a>
    </qna>
<qna id='d2'><q class='h2' >
      D.2 Is the standard XML attribute <span class='tt' >xml:lang</span> good enough
      for metadata?
    </q>                                    <a gist="Sometimes yes">
    <p>
      XML ( <link href="http://www.w3.org/XML/">
      http://www.w3.org/XML/</link >) provides a standard attribute
      <span class='tt' >xml:lang</span>,
      which can be used on any element to set the language. See the
      Chinese XML FAQ for details (<link href= 
      "http://www.ascc.net/xml/en/utf-8/faq.html#zh_xml_q15">http://www.ascc.net/xml/en/utf-8/faq.html#zh_xml_q15</link >).
      HTML also provides an identical attribute: 
      <span class='tt' >html:lang</span>.
    </p>
    <p>
      The xml:lang attribute specifies the language used in an
      element's content (and, presumably, its attributes' values).
      If the element is also a link (e.g., 
      <span class='tt' >&lt;a href="xxx.xml" xml:lang="en"&gt;a
      link&lt;/a&gt;</span>) the attribute specifies the
      language of "a link", and not (except by implication) the
      value of the target file "xxx.xml".
    </p>
    <p>
      This attribute uses the format of the Internet standard: RFC
      1766 <span class='i' >Tags for the Identification of Languages</span> ( <link
      href= 
      "http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1766.txt">
      http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1766.txt</link >
      ), which is best used with the following conventions:
    </p>
<p class='pre' xml:space='preserve' >
(( "x-" lll ) | ll )( "-" CC ( "-" xx )* )? 
</p>
    <p>
      where
    </p>
    <div class="ul">
      <p class='li'>
        <span class='b' >x-</span> means the code is not registered: avoid
        this if possible;
      </p>
      <p class='li'>
        <span class='b' >lll</span> is the ISO 639 <span class='i' >Code for the
        representation of names of languages</span> three-letter
        langage code (lower case) or something of your own
        invention;
      </p>
      <p class='li'>
        <span class='b' >ll</span> is the ISO 639 <span class='i' >Code for the
        representation of names of languages</span> two-letter
        language code (lower case);
      </p>
      <p class='li'>
        <span class='b' >CC</span> is the ISO 3166 <span class='i' >Code for the
        representation of names of countries</span> locale code
        (upper case);
      </p>
      <p class='li'>
        <span class='b' >xx</span> is any other useful information (e.g.
        script or dialect: see question A.4.), less than 8
        alphabetic letters.
      </p>
      </div>
    <p>
      But note that "<span class='i' >the two-character language codes of ISO
      639 are recognized as being inadequate for use as SGML
      language attributes when tagging text</span>" (Robin Cover, <link
      href="http://www.oasis-open.org/cover/iso639a.html">
      http://www.oasis-open.org/cover/iso639a.html</link >). This means
      that, for meta-data, the  <span class='tt' >
      xml:lang</span> attribute is mainly geared to
      providing information in a format that WWW tools will use.
      See question C.3 for more.
    </p>
    <p>
      Software which uses the  <span class='tt' >
      lang</span> attribute should match based on partial
      patterns, not exact matches based on the full pattern. In
      other words, if your software is looking for any Chinese
      text, it should accept "zh-TW", "zh-HK", etc., as well as
      simple "zh".
    </p>
    <p>
      ISO 639 has been extended with 2 (!) slightly different sets
      of 3-letter codes (see <link href= 
      "http://www.oasis-open.org/cover/iso639a.html">
      http://www.oasis-open.org/cover/iso639a.html</link >): one based
      on MARC/NISO/Z38.53 codes (see <link href= 
      "http://www.oasis-open.org/cover/bib-mn.html#nisoZ3953-1994">
      http://www.oasis-open.org/cover/bib-mn.html#nisoZ3953-1994</link >
      and <link href="http://lcweb.loc.gov/marc/langann.html">
      http://lcweb.loc.gov/marc/langann.html</link >) and the other
      based on the native pronunciation of the language's name
      (e.g., for "Chinese", the former gives "chi" and the latter
      gives "zho".) These three-letter codes cannot be used in
      RFC 1677 attributes like  <span class='tt' >
      xml:lang</span>. Which three letter code should you
      use? If you need backwards compatability with MARC or Z39.53,
      then those codes are best: this is probably the case with
      many libraries. However, the other codes are not so "English
      biased" and may be better for future systems. (The general
      WWW principle of "<span class='i' >be conservative in what you send, and
      generous in what you accept</span>" means that good systems in
      the future should try to accept both.)
    </p>                                       </a>
    </qna>
<qna id='d3'><q class='h2' >
      D.3 How can I represent "pinyin" or "traditional" or
      "simplified"
    </q> <author>
                <name>Rick Jelliffe</name>
                <email>ricko@gate.sinica.edu.tw</email>
        </author> <a gist="Use attributes" >
    <p>
      There is a big difference between "language" and "script".
      However, WWW internationalization treats the two together.
      That is simpler, but is probably not good enough for serious
      metadata and cataloging. Many languages can be written in
      multiple scripts: especially languages of nations which have
      experienced colonization of various kinds (political,
      economic, cultural, religious, etc).
    </p>
    <p>
      There is now an ISO standard for names of scripts: ISO 15924,
      <span class='i' >Code for the representation of names of scripts.</span>
      Information can be found at <link href= 
      "http://www.oasis-open.org/cover/related.html#iso15924">
      http://www.oasis-open.org/cover/related.html#iso15924</link >
      This standard was not available at the time the XML
      specification was written. (As of December 1998 it is
      "Committee Draft", which is the final stage before being
      accepted as a standard. See <link href= 
      "http://www.indigo.ie/egt/standards/iso15924/document/cd15924.pdf">
      http://www.indigo.ie/egt/standards/iso15924/document/cd15924.pdf</link >
      .)
    </p>
    <p>
      For Chinese-related scripts:
     </p><p>                               
          <span class='b' >3-letter code</span> : 
 
          <span class='b' >2-letter code</span> :  
 
          <span class='b' >code number</span> : 
 
          <span class='b' >English name</span> :
 
      </p>
      <p >
   
          Bod
          :
          Bo
          :
          330
          :
          Tibetan
 
      </p>
      <p >
      
          Bpm
          :
          Bp
          :
          285
          :
          <span class='i' >Bopomofo</span> (Chinese)
 
      </p>
      <p >
 
          Han
          :
          Hn
          :
          500
          :
          <span class='i' >Han</span> ideographs
 
      </p>
      <p>
   
          Hgl
          :
          Hg
          :
          420
          :
          <span class='i' >Hangul</span> (Korean)
 
      </p>
      <p>
   
          Hrg
          :
          Hr
          :
          410
          :
          <span class='i' >Hiragana</span> (Japanese)
 
      </p>
      <p>
   
          Khn
          :
          Kh
          :
          931
          :
          Hgl + Han (Korean)
  
      </p>
      <p>
     
          Jap
          :
          Ja
          :
          930
          :
          Han + Hrg +Kkn (Korean)
 
      </p>
      <p>
   
          Kkn
          :
          Kn
          :
          :
          <span class='i' >Katakana</span> (Japanese, Okinawan)
 
      </p>
      <p>
     
          Lat
          :
          Lt
          :
          :
          Latin letters (e.g. for Pinyin, Vietnamese, Japanese <span class='i' >
          romaji</span>)
     
        </p>                            
    <p>
      This standard is very helpful. But it does not provide a way
      to say "simplified" or "traditional".
    </p>   
    <p>
      One useful approach might be
    </p>
     <div class="ul">
      <p class='li'>
        use the <span class='tt' >
        xml:lang</span> (or  
        <span class='tt' >html:lang</span>) attribute to indicate
        locale-based characteristics (language, locale, dialect)
      </p>
      <p class='li'>
        use a <span class='tt' >
        script</span> attribute to indicate specific script
        characteristics.
      </p>
     </div>
    <p>
      One possible format might be an extended form of RFC 1766
      <span class='i' >Tags for the Identification of Languages</span> ( <link href= 
      "http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1766.txt">
      http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1766.txt</link >
      )
    </p>
<p class='pre'   xml:space='preserve' >
Sss ( "-" lll ( "-" CC ( "-" xx )*)?)?
</p>
    <p>
      where
    </p>
      <div class="ul">    
      <p class='li'>
        <span class='b' >Sss</span> is the 3-letter ISO 15924 script;
      </p>
      <p class='li'>
        <span class='b' >lll</span> is the ISO 639 (see question C.2
        above) three-letter language code (lower case)--if the
        language is not available, use the SIL ethnologue codes at
        <link href="http://www.sil.org/ftp/ethnolog13/langcodes.tdf">
        http://www.sil.org/ftp/ethnolog13/langcodes.tdf</link > (upper
        case);
      </p>
      <p class='li'>
        <span class='b' >CC</span> is the ISO 3199? locale code (upper
        case);
      </p>
      <p class='li'>
        <span class='b' >xx</span> is any other useful information
      </p>
      </div>
    
    <p>
      So, for example, for <span class='b' >simplified</span> Chinese
      writing official (i.e., the default) Mandarin dialect
    </p>
<p class='pre'   xml:space='preserve' >
<![CDATA[<p xml:lang="zh-CN" script="Han-zho-CN-simplified">&#x4E2D;</p>]]>
</p>
    <p>
      For <span class='b' >Pinyin</span>:
    </p>
<p class='pre' xml:space='preserve' >
<![CDATA[<p xml:lang="zh-CN" script="Lat-zho-CN-pinyin" >]]>
</p>
    <p>
      For <span class='b' >traditional</span> Chinese script, writing the
      Tawanese dialect of the Min Nam Chinese language (if that is
      important),
    </p>
<p class='pre'   xml:space='preserve' >
<![CDATA[<p xml:lang="zh-TW-CFR" script="Han-zho-TW-traditional">&#x4E2D;</p>]]>
</p>
    <p>
      For <span class='b' >traditional</span> Chinese script, writing the
      (Tawanese Aboriginal Austronesian) Amis language (?does this
      ever happen?),
    </p>
<p class='pre'  xml:space='preserve' >
<![CDATA[<p xml:lang="x-map-TW-ALV" 
script="Han-zho-TW-traditional">&#x4E2D;</p>]]>
</p>
    <p>
      For Central <span class='b' >Okinawan</span> (see Ethnologue <link
      href="http://www.sil.org/ethnologue/countries/Japa.html">
      http://www.sil.org/ethnologue/countries/Japa.html</link >),
      writing in <span class='i' >katakana</span>:
    </p>
<p class='pre'  xml:space='preserve' >
xml:lang="ja-JP-RYU" script="Kat"
</p>
    <p>
      If you use the script attribute in this form, you can use the
      following namespace declaration
    </p>
<p class='pre'  xml:space='preserve' >
xmlns:ascc-dcfaq=<link href= 
"http://www.ascc.net/xml/en/utf-8/dc-faq.html">"http://www.ascc.net/xml/en/utf-8/dc-faq.html"</link >
</p>
    <p>
      and then use the attribute name  
      <span class='tt' >ascc-dcfaq:script</span>
    </p>
    <p>
      Using namespaces, you can keep compatability with different
      methods of marking up scripts and language. Until some good
      system comes, you may need multiple attributes.
    </p>    </a>
    </qna> 
    </section>
    <section class="dc" id="dc">
    <title>
      Cataloging Information (Dublin Core)
    </title>
<p class='pre-dc'  xml:space='preserve' ><![CDATA[
<DC:TITLE       xml:lang="en">The Chinese Metadata FAQ </DC:TITLE>
<DC:CREATOR                  >Rick Jelliffe </DC:CREATOR>
<DC:SUBJECT     xml:lang="en">Dublin Core, DC, Resource Description Framework
                              RDF, EAD, Electronic Archive Description,
                              Warwick Framework, XML, SGML, Chinese, FAQ,
                              </DC:SUBJECT>
<DC:DESCRIPTION xml:lang="en">Frequently Asked Questions about using XML-based metadata, 
                              including for Chinese </DC:DESCRIPTION>
<DC:PUBLISHER   xml:lang="en">Computing Centre, Academia Sinica, Taiwan </DC:PUBLISHER>
<DC:TYPE        xml:lang="en">Text.Article </DC:TYPE>
<DC:DATE                     >1998-03-06 </DC:DATE>
<DC:RIGHTS                   >]]><link href= 
"http://www.ascc.net/xml/en/utf-8/legal.html"
>http://www.ascc.net/xml/en/utf-8/legal.html</link ><![CDATA[</DC:RIGHTS>]]>
       
</p>
</section>
  </body>
</faq>

