l10n_intro, l10n, locales, LOCPATH - Introduction to
localization (L10N)
Localization refers to the process of establishing information
within a computer system specific to each supported
language, cultural data, and coded character set (codeset)
combination. Each such combination gives rise to the definition
of one locale. The abbreviation L10N is often used
to stand for localization, as there are 10 characters
between the beginning "L" and the ending "N" of that word.
See i18n_intro(5) for introductory information about
internationalization and how to use system commands to set
a locale. See localedef(1), charmap(4), and locale(4) for
information about creating locales. See Writing Software
for the International Market for information about creating
locales and writing applications that use locales.
The current release of the operating system supports the
following languages with locales. Each language is discussed
separately in its own reference page.
Catalan
Chinese (Simplified and Traditional)
Czech
Danish
Dutch
English (discussed in this reference page)
Finnish
Flemish
French
German
Greek
Hebrew
Hungarian
Icelandic
Italian
Japanese
Korean
Lithuanian
Norwegian
Polish
Portuguese
Russian
Slovak
Slovene
Spanish
Swedish
Thai
Turkish
For some of the languages, more than one codeset and country
or territory are supported. Hence, multiple locales
are supported for certain languages. The following list
describes all the supported locales. For information about
the character encoding used by a particular locale, see
the reference page for the codeset specified in the last
part of the locale name or, for those that end in Unicode(5). Catalan locale for Spain (uses the Latin-1 codeset)
Catalan locale for Spain (uses the Latin-9 codeset)
Catalan locale for Spain (uses the UTF-8 codeset) Czech
locale for Czech Republic (uses the Latin-2 codeset) Czech
locale for Czech Republic (uses the UTF-8 codeset) Danish
locale for Denmark (uses the Latin-1 codeset) Danish
locale for Denmark (uses the Latin-9 codeset) Danish
locale for Denmark (uses the UTF-8 codeset) German locale
for Switzerland (uses the Latin-1 codeset) German locale
for Switzerland (uses the Latin-9 codeset) German locale
for Switzerland (uses the UTF-8 codeset) German locale for
Germany (uses the Latin-1 codeset) German locale for Germany
(uses the Latin-9 codeset) German locale for Germany
(uses the UTF-8 codeset) Greek locale for Greece (uses the
ISO Greek codeset) Greek locale for Greece (uses the UTF-8
codeset) English locale that includes the euro character
(uses the UTF-8 codeset)
This locale both supports the euro character and
defines the decimal point as a comma (,) and the
thousands separator as a period (.). Therefore,
this locale is useful in many European countries,
not just those for which English is the native language,
when assigned only to the LC_MONETARY locale
category or environment variable. English locale
for Great Britain (uses the Latin-1 codeset)
English locale for Great Britain (uses the Latin-9
codeset) English locale for Great Britain (uses the
UTF-8 codeset) English locale for the United States
(uses the Latin-1 codeset) English locale for the
United States (uses the Latin-9 codeset) English
locale for the United States (uses cp850 encoding)
Use this locale with data that contains accented
characters and that was generated on a PC using the
cp850 code page for character encoding. This character
encoding is usually the default for the DOS
and Windows operating systems in Europe. The
en_US.ISO8859-1 and en_US.cp850 locales encode
English characters the same way but use different
values for accented and other non-English characters
in the Latin-1 character set. English locale
for the United States (uses the UTF-8 codeset)
English locale for the United States (uses the
UTF-8 codeset)
The @euro variant defines the local currency sign
to be the euro character and the international currency
sign to be EUR. See also en_EU.UTF-8@euro.
Spanish locale for Spain (uses the Latin-1 codeset)
Spanish locale for Spain (uses the Latin-9 codeset)
Spanish locale for Spain (uses the UTF-8 codeset)
Finnish locale for Finland (uses the Latin-1 codeset)
Finnish locale for Finland (uses the Latin-9
codeset) Finnish locale for Finland (uses the UTF-8
codeset) French locale for Belgium (uses the
Latin-1 codeset) French locale for Belgium (uses
the Latin-9 codeset) French locale for Belgium
(uses the UTF-8 codeset) French locale for Canada
(uses the Latin-1 codeset) French locale for Canada
(uses the Latin-9 codeset) French locale for Canada
(uses the UTF-8 codeset) French locale for Switzerland
(uses the Latin-1 codeset) French locale for
Switzerland (uses the Latin-9 codeset) French
locale for Switzerland (uses the UTF-8 codeset)
French locale for France (uses the Latin-1 codeset)
French locale for France (uses the Latin-9 codeset)
French locale for France (uses the UTF-8 codeset)
Hebrew locale for Israel (uses the ISO Hebrew codeset)
Hungarian locale for Hungary (uses the Latin-2
codeset) Hungarian locale for Hungary (uses the
UTF-8 codeset) Icelandic locale for Iceland (uses
the Latin-1 codeset) Icelandic locale for Iceland
(uses the Latin-9 codeset) Icelandic locale for
Iceland (uses the UTF-8 codeset) Italian locale for
Italy (uses the Latin-1 codeset) Italian locale for
Italy (uses the Latin-9 codeset) Italian locale for
Italy (uses the UTF-8 codeset) Hebrew locale for
Israel (uses the ISO Hebrew codeset)
This locale name is supported for backward compatibility.
The recommended name to use for the ISO
Hebrew locale is he_IL.ISO8859-8. Japanese locale
for Japan (uses the DEC Kanji codeset) Japanese
locale for Japan (uses the Japanese EUC codeset)
Japanese locale for Japan (uses the Super DEC Kanji
codeset) Japanese locale for Japan (uses the Shift
JIS codeset) Japanese locale for Japan (uses the
UTF-8 codeset) Korean locale for Korea (uses the
DEC Korean codeset) Korean locale for Korea (uses
the Korean EUC codeset) Korean locale for Korea
(uses the UTF-8 codeset) Lithuanian locale for
Lithuania (uses the Latin-4 codeset) Lithuanian
locale for Lithuania (uses the UTF-8 codeset) Flemish
locale for Belgium (uses the Latin-1 codeset)
Flemish locale for Belgium (uses the Latin-9 codeset)
Flemish locale for Belgium (uses the UTF-8
codeset) Dutch locale for the Netherlands (uses
the Latin-1 codeset) Dutch locale for the Netherlands
(uses the Latin-9 codeset) Dutch locale for
the Netherlands (uses the UTF-8 codeset) Norwegian
locale for Norway (uses the Latin-1 codeset) Norwegian
locale for Norway (uses the Latin-9 codeset)
Norwegian locale for Norway (uses the UTF-8 codeset)
Polish locale for Poland (uses the Latin-2
codeset) Polish locale for Poland (uses the UTF-8
codeset) Portuguese locale for Portugal (uses the
Latin-1 codeset) Portuguese locale for Portugal
(uses the Latin-9 codeset) Portuguese locale for
Portugal (uses the UTF-8 codeset) Russian locale
for Russia (uses the ISO Cyrillic codeset) Russian
locale for Russia (uses the UTF-8 codeset) Slovak
locale for Slovakia (uses the Latin-2 codeset) Slovak
locale for Slovakia (uses the UTF-8 codeset)
Slovene locale for Slovenia (uses the Latin-2 codeset)
Slovene locale for Slovenia (uses the UTF-8
codeset) Swedish locale for Sweden (uses the
Latin-1 codeset) Swedish locale for Sweden (uses
the Latin-9 codeset) Swedish locale for Sweden
(uses the UTF-8 codeset) Thai locale for Thailand
(uses the TACTIS codeset) Turkish locale for Turkey
(uses the Latin-5 codeset) Turkish locale for
Turkey (uses the UTF-8 codeset) Simplified Chinese
locale for the People's Republic of China (uses the
DEC Hanzi codeset) Simplified Chinese locale for
the People's Republic of China (uses the GBK codeset,
an extension of the GB 2312-80 codeset) Simplified
Chinese locale for the People's Republic of
China (uses the GB18030 codeset, which extends GBK
by means of 4-byte encoding) Simplified Chinese
locale for the People's Republic of China (uses the
UTF-8 codeset) Traditional Chinese locale for Hong
Kong (uses the BIG-5 codeset) Traditional Chinese
locale for Hong Kong (uses the DEC Hanyu codeset)
Simplified Chinese locale for Hong Kong (uses the
DEC Hanzi codeset) Traditional Chinese locale for
Hong Kong (uses the Taiwanese EUC codeset) Traditional
Chinese locale for Hong Kong (uses the UTF-8
codeset) Traditional Chinese locale for Taiwan
(uses the BIG-5 codeset) Traditional Chinese locale
for Taiwan (uses the DEC Hanyu codeset) Traditional
Chinese locale for Taiwan (uses the Taiwanese EUC
codeset) Traditional Chinese locale for Taiwan
(uses the UTF-8 codeset)
This locale supports Simplified Chinese as well as
Traditional Chinese.
For the zh_CN.dechanzi locale, the @pinyin, @radical, and
@stroke variants are available for sorting by pinyin, radical,
and stroke, respectively. For the zh_TW.big5,
zh_TW.dechanyu, and zh_TW.eucTW locales, the @chuyin,
@radical, and @stroke variants are available for sorting
by chuyin, radical, and stroke, respectively. These variant
locale names (those including the @collation_modifier
suffix) are available for assignment to the LC_COLLATE
variable.
The and locales are the only locales that include the euro
monetary symbol in the coded character set. The
*.UTF-8@euro locales also define the local currency symbol
to be the euro character and the international currency
symbol to be EUR. See euro(5) for more information about
the euro symbol and how it is supported.
You can use the -a option with the locale command to list
all the locales available on the system. The POSIX (or C)
locale is always available because it must exist on all
systems that conform to The Open Group's UNIX specifications.
The POSIX locale is the default locale when locale
variables are not set.
Note
The dxterm terminal emulator does not support locales
based on the Unicode (UTF-8) or Latin-9 (ISO8859-15) codesets.
Use dtterm, the default terminal emulator for the
Common Desktop Environment (CDE), with locales based on
the Latin-9 and UTF-8 codesets.
System Locales [Toc] [Back]
When you install Worldwide Language Support, localization
is supported by two types of locales: Unicode locales and
dense code locales.
Unicode locales conform to Unicode and ISO/IEC 10646 standards
and use UTF-32 as the wide character encoding. Under
UTF-32 wide character encoding, wchar_t values represent
the same characters regardless of the locale and, because
Unicode standards prevail, implementation is consistent
across platforms.
Locales whose names end in use file code and internal process
code (wchar_t encoding) defined in the ISO 10646 and
Unicode standards.
Other, non-UTF-8 Unicode locales use traditional UNIX and
proprietary codesets for the file code while using UTF-32
as the internal process code. A subset of these Unicode
locales have a @ucs4 modifier; however, they are the same
as the locales without the @ucs4 modifier. The @ucs4 subset
is provided for backward compatibility and may be
removed in the future. You cannot select @ucs4 locales
from the CDE login menu; you must specify the locale name
in the LANG environment variable.
The universal.UTF-8 locale is also available (for use by
applications rather than end users). It supports the complete
set of characters in the universal character set
(UCS).
See Unicode(5) for more information about encoding formats.
For locales, file code may include characters encoded in
more than 1 byte; therefore, use these locales in applications
that can process multibyte data. Design new applications
based on multibyte locales, which incorporate a
large character repertoire, to enable the application to
expand future character support without changing the character
set.
Dense code locales use dense code for wide character
encoding to minimize table size (that is, codepoints are
assigned consecutively with no empty positions). Under
dense code locales, a wchar_t value for one locale may not
represent the same character in another locale and, thus,
is locale specific. Dense code locales are appropriate for
applications that have no dependencies on the internal
process code or, because dense code locales are slightly
more efficient than Unicode locales, require better performance.
All valid codepoints in multibyte character sets are
mapped to valid codepoints in Unicode, including unmapped
codepoints that are mapped to Unicode codepoints in the
private use area. Thus, dense code locales are equivalent
to Unicode locales. In general, the same charmaps and
locale source can be used for Unicode and dense code
locales. However, Unicode and dense code characters that
are not defined in the LC_COLLATE section may be sorted
differently.
A Unicode locale exists for each dense code locale. (However,
not all Unicode locales have a dense code version.)
For Latin-1 locales (ISO8859-1), the dense code and Unicode
locales are identical because Latin-1 characters are
the same as the first 256 characters in Unicode.
The operating system also supports three UCS transformation
formats (UTFs), UTF-8, UTF-16, and UTF-32, all of
which are defined in the Unicode standard. See Unicode(5)
for a full description of Unicode, UCS-4, and the transformation
formats.
The Unicode locales are installed in
/usr/i18n/lib/nls/ucsloc/. Dense code locales are
installed in /usr/i18n/lib/nls/loc/. A symbolic link,
/usr/i18n/lib/nls/dloc points to the system default
locales. For example, the Japanese locale filename,
/usr/lib/nls/loc/ja_JP.eucJP, is a symbolic link to
/usr/i18n/lib/nls/dloc/ja_JP.eucJP, where /dloc is a symbolic
link to either /ucsloc for the Unicode version, or
/loc for the dense code version, of the Japanese locale.
Keep in mind that the same locale name can refer to a
Unicode locale or to a dense code locale, depending on the
setting of the symbolic link. Thus, if running an application
in a locale is problematic, check the symbolic
link.
Because Unicode locales use consistent values for characters
in wchar_t form, a default link to Unicode locales
can increase consistency across locales and platforms.
However, some users may prefer the older, dense code
locales that use proprietary algorithms to convert characters
to wchar_t form, or an application may have dependencies
on dense code wchar_t encoding. To switch between
Unicode and dense code locales, the system administrator,
as root, uses i18nconfig to change the systemwide default
or manually changes the symbolic link
/usr/i18n/lib/nls/dloc from to
Environment Variables Related to Localization [Toc] [Back]
The following system environment variables can be set
(usually only by installed applications or by programmers
who are testing applications or converters under development)
to override the default search path for certain
kinds of localized files: Specifies the search path for
locales and codeset converters. This environment variable
is not defined by current industry standards. See
iconv_intro(5), iconv_open(3), and setlocale(3) for more
information.
Because the LOCPATH variable is not defined by
standards, it is recommended for use only when
testing locales or converters under development and
not as a systemwide method for finding installed
converters or locales. When you set LOCPATH, make
sure that the search path is valid for both locales
and converters. Otherwise, application and system
software can find only locales or only converters
in environments where both kinds of files are
required. Specifies the search path for message
catalogs, which contain translated text for programs.
This variable is used primarily by the
catopen() function. See catopen(3) for detailed
information on NLSPATH.
Customizing Locales [Toc] [Back]
Partial source files, along with an associated Makefile,
are available for many locales in the /usr/lib/nls/loc/src
directory. By editing one of these source files and using
the Makefile to rebuild the locale (make locale_name), you
can customize one or more of the following features: The
format of affirmative and negative responses (LC_MESSAGES
section) Rules and symbols for formatting monetary numeric
information (LC_MONETARY section) Rules and symbols for
formatting nonmonetary numeric information (LC_NUMERIC
section) Rules and symbols for formatting date and time
information (LC_TIME section)
As described in locale(4), the LC_CTYPE and LC_COLLATE
sections of these locale sources are not customizable
using this method. This means that you cannot use one of
these sources to change how characters are classified or
collated. By implication, this also means that you cannot
add a new character to a locale that does not already support
it. For example, you cannot add the European monetary
character (euro) to a locale that does not already
support that character. However, you can edit the LC_MONETARY
section to define a string identifier for euro by
using characters that the locale does support. For example,
you could replace the existing monetary symbol with
EUR.
See locale(4) for more information on a locale source
file. See Writing Software for the International Market
for information on user customization of LC_CTYPE and
LC_COLLATE.
Caution
Customized versions of locales that are provided with the
operating system are not preserved when the operating system
is reinstalled, even when an update installation procedure
is used. Therefore, you must back up files for customized
locales and their sources before reinstalling the
operating system. After the reinstallation is complete,
you must restore your customized locales to the system. If
the newly installed sources have revisions when compared
to the old sources, it might be preferable to apply your
customizations to the newly installed sources and rebuild
your customized locales.
Commands: locale(1), localedef(1)
Functions: catopen(3)
Files: charmap(4), locale(4)
Others: Catalan(5), Chinese(5), Czech(5), dechanyu(5),
dechanzi(5), deckanji(5), deckorean(5), Dutch(5),
eucJP(5), eucKR(5), eucTW(5), euro(5), Finnish(5),
French(5), GB18030(5), GBK(5) ,German(5), Greek(5),
Hebrew(5), Hungarian(5), i18n_intro(5), i18n_printing(5),
Icelandic(5), iconv_intro(5), iso2022(5), iso2022jp(5),
iso8859-1(5), iso8859-2(5), iso8859-4(5), iso8859-5(5),
iso8859-7(5), iso8859-8(5), iso8859-9(5), iso8859-15(5),
Italian(5), Japanese(5), jiskanji(5), Korean(5), Lithuanian(5), Norwegian(5), Polish(5), Portuguese(5), Russian(5), sbig5(5), sdeckanji(5), shiftjis(5), Slovak(5),
Slovene(5), Spanish(5), Swedish(5), TACTIS(5), telecode(5)
Thai(5), Turkish(5), Unicode(5)
Writing Software for the International Market
Using International Software
l10n_intro(5)
[ Back ] |