big5 - A character encoding system (codeset) for Traditional
Chinese
The big5 codeset is one of several codesets that support
the Traditional Chinese language. This codeset includes
the following character sets: ASCII Big-5
The big5 codeset uses a combination of single-byte data
and two-byte data to represent ASCII characters, symbols,
and Chinese ideographic characters.
ASCII Characters [Toc] [Back]
All ASCII characters are represented in the form of single-byte,
7-bit data in the big5 codeset; that is, the
most significant bit (MSB) of a byte that represents an
ASCII character is always set off. For more information,
see ascii(5).
Big-5 Character Groups [Toc] [Back]
The Big-5 character set defines the following character
groups: Special symbols (408) Level 1 characters (5401)
Level 2 characters (7652) Level 1 user-defined space (785)
Level 2 user-defined space (2983) Level 3 user-defined
space (2041)
Code Values for Big-5 Characters [Toc] [Back]
Each Big-5 character is represented by a two-byte code
that compiles according to the Big-5 standard. The MSB of
the first byte is always set on while that of the second
byte can be on or off. Code ranges for characters in the
different character groups are as follows: Special symbols:
A140 to A3BF Level 1 characters: A440 to C67E Level
2 characters: C940 to F9D5 Level 1 user-defined space:
FA40 to FEFE Level 2 user-defined space: 8E40 to A0FE
Level 3 user-defined space: 8140 to 8DFE
In this space, the valid code range for the first
byte is 81 to FE, while that for the second byte is
40 to 7E and A1 to FE.
Codeset Conversion [Toc] [Back]
The following codeset converter pairs are available for
converting Traditional Chinese characters between big5 and
other encoding formats. Refer to iconv_intro(5) for an
introduction to codeset conversion. For more information
about the other codeset for which big5 is the input or
output, see the reference page specified in the list item.
dechanyu_big5, big5_dechanyu
Converting from and to DEC Hanyu: dechanyu(5)
dechanzi_big5, big5_dechanzi
Converting from and to DEC Hanzi: dechanzi(5)
eucTW_big5, big5_eucTW
Converting from and to Taiwanese Extended UNIX
Code: eucTW(5) sbig5_big5, big5_sbig5
Converting from and to Shift Big-5: sbig5(5) telecode_big5,
big5_telecode
Converting from and to Telecode: telecode(5)
UTF-16_big5, big5_UTF-16
Converting from and to UTF-16: Unicode(5)
UCS-4_big5, big5_UCS-4
Converting from and to UCS-4: Unicode(5)
UTF-8_big5, big5_UTF-8
Converting from and to UTF-8: Unicode(5)
Note
The big5 encoding format is identical to the encoding format
used in PC code pages that support Traditional Chinese.
Therefore, you can use codeset converters that convert
between big5 and UTF-16, UCS-4, or UTF-8 to convert
Traditional Chinese data between PC code-page and Unicode
encoding formats. Refer to code_page(5) for a discussion
of how the operating system supports PC code pages.
Fonts for Big-5 Characters [Toc] [Back]
The operating system supports Big-5 code by internally
converting characters to DEC Hanyu. Therefore, DEC Hanyu
fonts are used for Big-5 characters. Both display and
printer fonts are provided for DEC Hanyu and these are
listed in the dechanyu(5) reference page.
For general information about printer support for and
codeset conversion of Asian text, refer to i18n_printing(5).
Commands: locale(1)
Others: ascii(5), Chinese(5), code_page(5), dechanyu(5),
dechanzi(5), eucTW(5), GB18030(5), GBK(5), i18n_intro(5),
i18n_printing(5), iconv_intro(5), l10n_intro(5), sbig5(5),
telecode(5), Unicode(5)
big5(5)
[ Back ] |