i18n_intro, i18n, LANG, LC_ALL, LC_COLLATE, LC_CTYPE,
LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME - Introduction
to internationalization (I18N)
Internationalization refers to the process of developing
programs without prior knowledge of the language, cultural
data, or character-encoding schemes that the programs are
expected to handle. In other words, internationalization
refers to the availability and use of interfaces that let
programs modify their behavior at run time for operation
in a specific language environment. The abbreviation I18N
is often used to stand for internationalization, as there
are 18 characters between the beginning "I" and the ending
"N" of that word.
The I18N interfaces and utilities provided with the operating
system conform to Issue 4 of X/Open CAE specifications.
A concept related to internationalization is localization
(L10N), which refers to the process of establishing information
within a computer system for each combination of
native language, cultural data, and coded character set
(codeset). A locale is a database that provides information
for a unique combination of these three components.
However, locales do not solve all of the problems that
localization must address. Many native languages require
additional support in the form of language-specific print
filters, fonts, codeset converters, character input methods,
and other kinds of specialized software.
See the following reference pages for additional introductory
information on topics related to internationalization:
For more information on localization and locales For
an introduction to codeset conversion For a summary of
printer support for native languages
Characters, Character Sets, and Codesets
A character is a member of a set of elements used for the
organization, control, or representation of data.
A character set is a set of alphabetic or other characters
used to construct the words and other elementary units of
a native language or computer language. A character set
specifies only the characters that are included in the
set. ASCII, CNS 11643 and DTSCS are examples of character
sets.
A coded character set (codeset) is a set of unambiguous
rules that support one or more character sets and establishes
the one-to-one relationship between each character
and its bit representation. In other words, a codeset consists
of the code points for characters in one or more
character sets. For example, DEC Hanyu (dechanyu) is a
codeset for Chinese and contains code points for characters
in the ASCII, CNS 11643-1986 (plane 1 and plane 2),
and DTSCS character sets.
Language Announcement (Setting Locale) [Toc] [Back]
Language announcement is the mechanism by which language,
cultural data, and codeset requirements are set either for
the system as a whole or by individual users. An application
can also set these requirements, although it is more
common for an internationalized application to use the
setting in effect for the user who runs the program. See
the System Administration manual for information about
setting systemwide defaults for shells. See setlocale(3)
and Writing Software for the International Market for
information on how applications query or set locale
requirements at run time.
Language announcement is performed by setting one or more
reserved environment variables to the name of an installed
locale. Each locale has associated with it collating
sequences, character conversion tables, character classification
tables, formats for different kinds of data, and
message catalogs. If the same locale meets user requirements
in all these categories, set only the LANG environment
variable to the locale name. A locale name usually
has the following format:
language_territory.codeset[@modifier]
Where language represents the human language of the
locale, territory represents a geographic country or
region, codeset is the coded character set used in the
locale, and the optional @modifier suffix represents additional
information for localization of data.
The following Korn shell example sets LANG to a locale
supporting the English language, United States cultural
data, and ISO8859-1 codeset: $ LANG=en_US.ISO8859-1
The following C shell example sets LANG to a locale supporting
the Traditional Chinese language, Hong Kong cultural
data, and the DEC Hanyu codeset: % setenv LANG
zh_HK.dechanyu
Locale name formats can vary from vendor to vendor. Use
the locale -a command to display the names of locales
installed on your system. See l10n_intro(5) for a list of
the locales provided with the Tru64 UNIX product.
An alternative way to set locale requirements for all
locale categories is to set the LC_ALL environment variable.
The difference between the LANG and LC_ALL variables
is that LC_ALL is a high-precedence variable that overrides
all other locale variables, including LANG. The LANG
variable, on the other hand, is a low-precedence variable.
When used by itself, the LANG variable implicitly sets all
locale categories to the specified locale just as LC_ALL
does. However, the LANG variable can be used together with
variables for specific locale categories to create a multilocale
environment. The category-specific locale variables
and what they control follow: String collation Character
classification Translations for messages and valid
strings for "yes" and "no" responses The currency symbol
and the format of monetary values The format of numeric
values The format of date and time values
A locale can support only one set of date and time
formats; however, there can be several sets of date
and time formats in use for a particular language
and territory. See l10n_intro(5) for information
about creating a site-specific version of a locale
to support date and time formats different from
those supported by an installed locale.
The operating system provides dense code locales and Unicode
locales. Unicode locales are installed in
/usr/i18n/lib/nls/ucsloc/. Dense code locales are
installed in /usr/i18n/lib/nls/loc/. The Unicode locales
enable consistent wchar_t values across locales and platform
interoperability. The system administrator, as root,
can define the systemwide default as Unicode locales or
dense code locales by changing the symbolic link
/usr/i18n/lib/nls/dloc/ from to l10n_intro(5) for a more
information on the Unicode locales and switching between
Unicode and dense code. See Unicode(5) for more information
about UCS-4 and UTF-8 formats.
Unicode locales, with a UTF-8 suffix, use UTF-32 as the
internal process code and UTF-8 as the file format.
The operating system also includes a complete set of nonUTF-8
Unicode locales in /usr/i18n/lib/nls/ucsloc/ that
provide UTF-32 internal process code for applications that
require file code in the format of the traditional UNIX or
a proprietary codeset.
A @modifier suffix indicates locale variants that support
alternative rules for collation in Asian languages. Use
locales with these suffixes only when setting LC_COLLATE.
For example, three different sets of collation rules
(chuyin, radical, and stroke) can be used with the locale
supporting the Chinese language, Taiwanese cultural data,
and the Taiwanese EUC codeset. If Korn shell users want to
use this locale, they might make the following settings: $
LANG=zh_TW.eucTW $ LC_COLLATE=zh_TW.eucTW@stroke
The preceding example implicitly sets all locale category
variables to zh_TW.eucTW, except for the LC_COLLATE variable,
which is set to zh_TW.eucTW@stroke. The following
locale command displays the variable settings after these
assignments:
$ locale LANG=zh_TW.eucTW LC_COLLATE=zh_TW.eucTW@stroke
LC_CTYPE="zh_TW.eucTW" LC_MONETARY="zh_TW.eucTW"
LC_NUMERIC="zh_TW.eucTW" LC_TIME="zh_TW.eucTW" LC_MESSAGES="zh_TW.eucTW"
LC_ALL=
Commands: locale(1), setlocale(3)
Others: i18n_printing(5), iconv_intro(5), l10n_intro(5),
Unicode(5)
Writing Software for the International Market
Using International Software
System Administration
i18n_intro(5)
[ Back ] |