The XML Logo

Chinese Text Processing FAQ

This FAQ gives information on localizing text processing systems to work on Chinese text. The focus is on systems which generate or use XML, HTML, SGML or XHTML.

This FAQ does not deal with locale issues such as dates, time, money, user interface language, user interface cultural conventions.

For more information on related topics, see the Chinese XML FAQ

A. General

Is it difficult to support Chinese text processing?

Where do I start?

But my program uses 8-bit characters! Can I do Chinese?

How can users enter Chinese characters in non-Chinese computer systems?

What else should I look for?

B. Line Breaks

How should I generate line breaks for Chinese text?

How should I generate line breaks for Chinese marked-up text?

How wide is a Chinese character in constant-width fonts?

How should I generate line breaks for Chinese typeset text?

What about letter-quality Chinese publications?

What about for good quality Chinese publications?

How should I treat whitespaces in Chinese text?

C. Searching and Indexing

How do I search or index Chinese documents?

How do I search or index using characters?

What I don't want to make an index, just search?

How do I search or index using words?

Cataloging Information (Dublin Core)

<DC:TITLE       >Chinese Text Processing FAQ</DC:TITLE>
<DC:CREATOR     >Rick Jelliffe</DC:CREATOR>
<DC:SUBJECT     >localization, Chinese, internationalization, l10n, i18n,
   typesetting, XML, HTML, SGML, XHTML, text processing,
   line break, linebreak, word wrap, word wrap, index, search,
<DC:DESCRIPTION >Frequently Asked Questions about processing Chinese text</DC:DESCRIPTION>
<DC:PUBLISHER   >Computing Centre, Academia Sinica, Taiwan</DC:PUBLISHER>
<DC:TYPE        >Text.Article</DC:TYPE>
<DC:DATE        >1999-05-12</DC:DATE>

FAQ maintained by: Rick Jelliffe

Expanding FAQ automatically generated from a QAML document
by QAML Tools from Academia Sinica Computing Center, Taipei, Taiwan