dechanyu - A character encoding system (codeset) for Traditional
Chinese
The DEC Hanyu (dechanyu) codset consists of the following
sets of characters: ASCII The first and second character
planes of CNS11643-1986 Digital Taiwan Supplemental Character
Set (DTSCS) User-defined characters
DEC Hanyu uses a combination of single-byte data, 2-byte
data, and 4-byte data to represent ASCII characters, symbols,
or ideographic characters.
ASCII characters [Toc] [Back]
All ASCII characters are represented in the form of single-byte,
7-bit data in DEC Hanyu; that is, the most significant
bit (MSB) of a byte that represents an ASCII
character is always set off. Refer to ascii(5) for more
information about the ASCII character set.
CNS11643-1986 Characters (Planes 1 and 2) [Toc] [Back]
Each plane of the CNS 11643-1986 character set is divided
into 94 rows and each of these rows has 94 columns. The
characters defined in plane 1 and plane 2 of CNS
11643-1986 are as follows:
-----------------------------------------------------------------------
Character Plane Character Type Number of Characters
-----------------------------------------------------------------------
1 Special characters 651
Control characters 33
Frequently used characters 5401
2 Less frequently used charac- 7650
ters
-----------------------------------------------------------------------
Note that the first two planes of the CNS11643-1986 character
set are the same as those specified for the revised
CNS11643-1992 character set.
In DEC Hanyu, each CNS 11643-1986 character is represented
by two bytes, in conformance with the CNS 11643-1986 standard.
The MSB of the first byte is always turned on while
that of the second byte is on for the first character
plane and off for the second character plane.
The first byte of CNS 11643-1986 encoding determines the
row number of the character, while the second byte determines
its column number. Code ranges for the two character
planes are as follows: A1A1 to FEFE A121 to FE7E
The following formulas determine the value of a CNS
11643-1986 character in relation to its row and column
numbers. For a CNS 11643-1986 Plane 1 character:
1st byte = A0(hex) + Row number
2nd byte = A0(hex) + Column number For a CNS
11643-1986 Plane 2 character:
1st byte = A0(hex) + Row number
2nd byte = 20(hex) + Column number
For example, if a character is positioned at the first
column of the 36th row on CNS 11643 plane 1, its value is
C4A1, which is calculated as follows:
1st byte = A0(hex) + 36 = C4(hex)
2nd byte = A0(hex) + 01 = A1(hex)
Similarly, if a character is positioned at the first column
of the 36th row on CNS 11643 plane 2, its value is
C421, which is calculated as follows:
1st byte = A0(hex) + 36 = C4(hex)
2nd byte = 20(hex) + 01 = 21(hex)
DTSCS Characters [Toc] [Back]
Currently, only the EDPC (Electronic Data Processing Centre)
Recommended Character Set, which defines a total of
6319 characters (rows 1 to 68), is included in the Digital
Taiwan Supplementary Character Set (DTSCS). In the revised
CNS 11643-1992 standard, the 6319 characters in the EDPC
Recommended Character Set are assigned to the third and
fourth character planes as follows:
---------------------------------------------------------
EDPC Characters Character Plane Number of Characters
---------------------------------------------------------
Part I Plane 3 6148
Part II Plane 4 171
---------------------------------------------------------
The characters defined in Plane 3 and Plane 4 of CNS
11643-1992 are as follows:
--------------------------------------------------------------------------
Character Plane Character Type Number of
Characters
--------------------------------------------------------------------------
3 Rarely-used characters (EDPC Part I) 6148
4 Used for residency system, ISO 2nd 7298
edition DIS 10646 Han characters, 171
EDPC Part II Characters
--------------------------------------------------------------------------
In DEC Hanyu, each DTSCS character is represented by a
4-byte value. The first two bytes are the leading value,
specifically C2CB, which is used as a designator sequence
for the DTSCS character set. The MSB of the third and
fourth bytes is set on for the EDPC Recommended Character
Set.
User-Defined Characters [Toc] [Back]
In addition to the two Chinese character sets described in
preceding sections, DEC Hanyu provides an area of 3587
positions for user-defined characters (UDC). The positions
for UDC are those positions that are unused (but not
reserved) code points on the first and second character
planes of CNS 11643-1986.
The encoding for UDC is exactly the same as that for
CNS11643-1986 except that the two sets of characters
occupy different regions. Code ranges for UDC are as
follows:
-----------------------------------------------
Character Plane Number of UDC Code Range
-----------------------------------------------
1 145 FDCC to FEFE
1 2256 AAA1 to C1FE
2 1186 F245 to FE7E
-----------------------------------------------
Codeset Conversion [Toc] [Back]
The following codeset converter pairs are available for
converting Traditional Chinese characters between dechanyu
and other encoding formats. Refer to iconv_intro(5) for
an introduction to codeset conversion. For more information
about the other codeset for which dechanyu is the
input or output, see the reference page specified in the
list item. big5_dechanyu, dechanyu_big5
Converting from and to the Big-5 codeset: big5(5).
Note that Big-5 encoding is equivalent to the
Microsoft code-page format used on PCs for Traditional
Chinese. See code_page(5) for information
about PC code pages. dechanzi_dechanyu,
dechanyu_dechanzi
Converting from and to the DEC Hanzi codeset:
dechanzi(5). eucTW_dechanyu, dechanyu_eucTW
Converting from and to Taiwanese Extended UNIX
Code: eucTW(5). telecode_dechanyu, dechanyu_telecode
Converting from and to the Telecode codeset: telecode(5). UTF-16_dechanyu, dechanyu_UTF-16
Converting from and to UTF-16 format: Unicode(5).
UCS-4_dechanyu, dechanyu_UCS-4
Converting from and to UCS-4 format: Unicode(5).
UTF-8_dechanyu, dechanyu_UTF-8
Converting from and to UTF-8 format: Unicode(5).
Fonts for DEC Hanyu Characters [Toc] [Back]
The operating system provides both screen and printer
fonts for DEC Hanyu characters.
The following DECwindows Motif fonts are grouped according
to character set and family; they reflect various sizes
and typefaces for 75dpi and 100dpi display devices:
CNS 11643-1986 Fonts (Hei family):
-adecw-hei-medium-r-normal--16-160-75-75-m-160-dec.cns11643.1986-2
-adecw-heimedium-r-normal--24-240-75-75-m-240-dec.cns11643.1986-2
-adecw-hei-medium-r-normal--16-160-100-100-m-160-dec.cns11643.1986-2
-adecw-heimedium-r-normal--24-240-100-100-m-240-dec.cns11643.1986-2
CNS 11643-1986 fonts (Screen family):
-adecw-screen-medium-rnormal--18-180-75-75-m-160-dec.cns11643.1986-2
-adecwscreen-medium-r-nor-
mal--24-240-75-75-m-240-dec.cns11643.1986-2 -adecw-screenmedium-r-normal--18-180-100-100-m-160-dec.cns11643.1986-2
-adecw-screen-medium-r-normal--24-240-100-100-m-240-dec.cns11643.1986-2
-adecwscreen-medium-r-nor-
mal--18-180-100-100-m-160-dec.cns11643.1986-UDC -adecwscreen-medium-r-nor-
mal--24-240-100-100-m-240-dec.cns11643.1986-UDC
CNS 11643-1986 fonts (Sung family):
-adecw-sung-medium-r-normal--24-240-75-75-m-240-dec.cns11643.1986-2
-adecw-sungmedium-r-normal--32-320-75-75-m-320-dec.cns11643.1986-2
-adecw-sung-medium-r-normal--24-240-100-100-m-240-dec.cns11643.1986-2
-adecw-sungmedium-r-normal--32-320-100-100-m-320-dec.cns11643.1986-2
DTSCS fonts (Hei family):
-adecw-hei-medium-r-normal--16-160-75-75-m-160-dec.dtscs.1990-2
-adecw-heimedium-r-normal--24-240-75-75-m-240-dec.dtscs.1990-2
-adecw-hei-medium-r-normal--16-160-100-100-m-160-dec.dtscs.1990-2
-adecw-heimedium-r-normal--24-240-100-100-m-240-dec.dtscs.1990-2
DTSCS fonts (Screen family):
-adecw-screen-medium-r-normal--18-180-75-75-m-160-dec.dtscs.1990-2
-adecw-screenmedium-r-normal--24-240-75-75-m-240-dec.dtscs.1990-2
-adecw-screen-medium-r-normal--18-180-100-100-m-160-dec.dtscs.1990-2
-adecw-screenmedium-r-normal--24-240-100-100-m-240-dec.dtscs.1990-2
DTSCS fonts (Sung family):
-adecw-sung-medium-r-normal--24-240-75-75-m-240-dec.dtscs.1990-2
-adecw-sungmedium-r-normal--32-320-75-75-m-320-dec.dtscs.1990-2
-adecw-sung-medium-r-normal--24-240-100-100-m-240-dec.dtscs.1990-2
-adecw-sungmedium-r-normal--32-320-100-100-m-320-dec.dtscs.1990-2
The operating system provides the following PostScript
printer fonts for CNS 11643-1986 characters: Hei-LightCNS11643
Sung-Light-CNS11643
These PostScript fonts support only the Traditional Chinese
characters in planes 1 and 2 of the CNS 11643 character
set. The Traditional Chinese characters in the DTSCS
character set are not supported by printer fonts. The
restriction also applies to the eucTW codeset, which also
includes DTSCS characters and is supported by the same
fonts as dechanyu.
For general information on printing Asian language text,
refer to i18n_printing(5).
Commands: locale(1)
Others: ascii(5), big5(5), Chinese(5), code_page(5),
dechanzi(5), eucTW(5), GBK(5), i18n_intro(5), i18n_printing(5), iconv_intro(5), l10n_intro(5), sbig5(5), telecode(5)
dechanyu(5)
[ Back ] |