chrtbl(1M) chrtbl(1M)
chrtbl - generate character classification and conversion tables
chrtbl [file]
The chrtbl command creates two tables containing information on character
classification, upper/lowercase conversion, character-set width, and
numeric formatting. One table is an array of (2*257*4) + 7 bytes that is
encoded so a table lookup can be used to determine the character
classification of a character, convert a character [see ctype(3C)], and
find the byte and screen width of a character in one of the supplementary
code sets. The other table contains information about the format of
non-monetary numeric quantities: the first byte specifies the decimal
delimiter; the second byte specifies the thousands delimiter; and the
remaining bytes comprise a null-terminated string indicating the grouping
(each element of the string is taken as an integer that indicates the
number of digits that comprise the current group in a formatted nonmonetary
numeric quantity).
chrtbl reads the user-defined character classification and conversion
information from file and creates three output files in the current
directory. To construct file, use the file supplied in
/usr/lib/locale/C/chrtbl_C as a starting point. You may add entries, but
do not change the original values supplied with the system. For example,
for other locales you may wish to add eight-bit entries to the ASCII
definitions provided in this file.
One output file, ctype.c (a C language source file), contains a
(2*257*4)+7-byte array generated from processing the information from
file. You should review the content of ctype.c to verify that the array
is set up as you had planned. (In addition, an application program could
use ctype.c.) The first 257*4 bytes of the array in ctype.c are used for
storing 32-bit character classification for 257 characters. The
characters used for initializing these bytes of the array represent
character classifications that are defined in ctype.h; for example, _L
means a character is lowercase and _S|_B means the character is both a
spacing character and a blank. The second 257*4 bytes of the array are
used for character conversion with 514 elements consisting of 16-bit
each. These bytes of the array are initialized so that characters for
which you do not provide conversion information will be converted to
themselves. When you do provide conversion information, the first value
of the pair is stored where the second one would be stored normally, and
vice versa; for example, if you provide <0x41 0x61>, then 0x61 is stored
where 0x41 would be stored normally, and 0x61 is stored where 0x41 would
be stored normally. The last 7 bytes are used for character width
information for up to three supplementary code sets.
The second output file (a data file) contains the same information, but
is structured for efficient use by the character classification and
conversion routines [see ctype(3C)]. The name of this output file is the
Page 1
chrtbl(1M) chrtbl(1M)
value you assign to the keyword LC_CTYPE read in from file. Before this
file can be used by the character classification and conversion routines,
it must be installed in the /usr/lib/locale/locale directory with the
name LC_CTYPE by someone who is super-user or a member of group bin.
This file must be readable by user, group, and other; no other
permissions should be set. To use the character classification
and conversion tables in this file, set the LC_CTYPE environment variable
appropriately [see environ(5) or setlocale(3C)].
The third output file (a data file) is created only if numeric formatting
information is specified in the input file. The name of this output file
is the value you assign to the keyword LC_NUMERIC read in from file.
Before this file can be used, it must be installed in the
/usr/lib/locale/locale directory with the name LC_NUMERIC by someone who
is super-user or a member of group bin. This file must be readable by
user, group, and other; no other permissions should be set. To use the
numeric formatting information in this file, set the LC_NUMERIC
environment variable appropriately [see environ(5) or setlocale(3C)].
The name of the locale where you install the files LC_CTYPE and
LC_NUMERIC should correspond to the conventions defined in file. For
example, if French conventions were defined, and the name for the French
locale on your system is french, then you should install the files in
/usr/lib/locale/french.
If no input file is given, or if the argument "-" is encountered, chrtbl
reads from standard input.
The syntax of file allows the user to define the names of the data files
created by chrtbl, the assignment of characters to character
classifications, the relationship between upper and lowercase letters,
byte and screen widths for up to three supplementary code sets, and three
items of numeric formatting information: the decimal delimiter, the
thousands delimiter, and the grouping. The keywords recognized by chrtbl
are:
LC_CTYPE name of the data file created by chrtbl to contain
character classification, conversion, and width
information
isupper character codes to be classified as uppercase letters
islower character codes to be classified as lowercase letters
isalpha character codes to be classified as letters
isdigit character codes to be classified as numeric digits
isspace character codes to be classified as white-space
(delimiter) characters
Page 2
chrtbl(1M) chrtbl(1M)
ispunct character codes to be classified as punctuation
characters
iscntrl character codes to be classified as control characters
isblank character codes to be classified as blank characters
isprint character codes to be classified as printing characters,
including the space character
isgraph character codes to be classified as printable
characters, not including the space character
isxdigit character codes to be classified as hexadecimal digits
ul relationship between upper- and lowercase characters
cswidth byte and screen width information (by default, each is
one character wide)
LC_NUMERIC name of the data file created by chrtbl to contain
numeric formatting information
decimal_point decimal delimiter
thousands_sep thousands delimiter
grouping string in which each element is taken as an integer that
indicates the number of digits that comprise the current
group in a formatted non-monetary numeric quantity.
Any lines with the number sign (#) in the first column are treated as
comments and are ignored. Blank lines are also ignored.
Characters for isupper, islower, isalpha, isdigit, isspace, ispunct,
iscntrl, isblank, isprint, isgraph, isxdigit, and ul can be represented
as a hexadecimal or octal constant (for example, the letter a can be
represented as 0x61 in hexadecimal or 0141 in octal). Hexadecimal and
octal constants may be separated by one or more space and/or tab
characters.
The dash character (-) may be used to indicate a range of consecutive
numbers. Zero or more space characters may be used for separating the
dash character from the numbers.
The backslash character (\) is used for line continuation. Only a
carriage return is permitted after the backslash character.
The relationship between upper- and lowercase letters (ul) is expressed
as ordered pairs of octal or hexadecimal constants: <uppercase_character
lowercase_character>. These two constants may be separated by one or
more space characters. Zero or more space characters may be used for
Page 3
chrtbl(1M) chrtbl(1M)
separating the angle brackets (< >) from the numbers.
The following is the format of an input specification for cswidth:
cswidth n1[[:s1][,n2[:s2][,n3[:s3]]]]
where,
n1 byte width for supplementary code set 1, required
s1 screen width for supplementary code set 1
n2 byte width for supplementary code set 2
s2 screen width for supplementary code set 2
n3 byte width for supplementary code set 3
s3 screen width for supplementary code set 3
decimal_point and thousands_sep are specified by a single character that
gives the delimiter. grouping is specified by a quoted string in which
each member may be in octal or hex representation. For example, \3 or
\x3 could be used to set the value of a member of the string to 3.
The following is an example of an input file used to create the USAENGLISH
code set definition table in a file named usa and the nonmonetary
numeric formatting information in a file name num-usa.
LC_CTYPE usa
isupper 0x41 - 0x5a
islower 0x61 - 0x7a
isdigit 0x30 - 0x39
isspace 0x20 0x9 - 0xd
ispunct 0x21 - 0x2f 0x3a - 0x40 \
0x5b - 0x60 0x7b - 0x7e
iscntrl 0x0 - 0x1f 0x7f
isblank 0x9 0x20
isprint 0x20
isxdigit 0x30 - 0x39 0x61 - 0x66 \
0x41 - 0x46
ul <0x41 0x61> <0x42 0x62> <0x43 0x63> \
<0x44 0x64> <0x45 0x65> <0x46 0x66> \
<0x47 0x67> <0x48 0x68> <0x49 0x69> \
<0x4a 0x6a> <0x4b 0x6b> <0x4c 0x6c> \
<0x4d 0x6d> <0x4e 0x6e> <0x4f 0x6f> \
<0x50 0x70> <0x51 0x71> <0x52 0x72> \
<0x53 0x73> <0x54 0x74> <0x55 0x75> \
<0x56 0x76> <0x57 0x77> <0x58 0x78> \
<0x59 0x79> <0x5a 0x7a>
cswidth 1:1,0:0,0:0
LC_NUMERIC num_usa
decimal_point .
thousands_sep ,
grouping "\3"
Page 4
chrtbl(1M) chrtbl(1M)
/usr/lib/locale/locale<b>/LC_CTYPE
data files containing character classification,
conversion, and character-set width information created
by chrtbl
/usr/lib/locale/locale<b>/LC_NUMERIC
data files containing numeric formatting information
created by chrtbl
/usr/include/ctype.h
header file containing information used by character
classification and conversion routines
/usr/lib/locale/C/chrtbl_C
input file used to construct LC_CTYPE and LC_NUMERIC in
the default locale.
ctype(3C), setlocale(3C), environ(5)
The error messages produced by chrtbl are intended to be selfexplanatory.
They indicate errors in the command line or syntactic
errors encountered within the input file.
Changing the files in /usr/lib/locale/C will cause the system to behave
unpredictably.
In IRIX 6.5, the content of the LC_CTYPE locale category was extended to
comply with the XPG/4 standard. The older LC_CTYPE binary format will
not be recognized by the C library. Therefore, all custom-built locales
created under an older version of IRIX must be regenerated with the later
versions of localedef(1) and associated chrtbl(1M)/wchrtbl(1M).
PPPPaaaaggggeeee 5555 [ Back ]
|