Chinese Text Processing FAQ

This FAQ gives information on localizing text processing systems to work on Chinese text. The focus is on systems which generate or use XML, HTML, SGML or XHTML.

This FAQ does not deal with locale issues such as dates, time, money, user interface language, user interface cultural conventions.

A. General

Is it difficult to support Chinese text processing?

Where do I start?

But my program uses 8-bit characters! Can I do Chinese?

How can users enter Chinese characters in non-Chinese computer systems?

What else should I look for?

B. Line Breaks

How should I generate line breaks for Chinese text?

How should I generate line breaks for Chinese marked-up text?

How wide is a Chinese character in constant-width fonts?

How should I generate line breaks for Chinese typeset text?

What about letter-quality Chinese publications?

What about for good quality Chinese publications?

How should I treat whitespaces in Chinese text?

C. Searching and Indexing

How do I search or index Chinese documents?

How do I search or index using characters?

What I don't want to make an index, just search?

How do I search or index using words?

