*nix Documentation Project
·  Home
 +   man pages
·  Linux HOWTOs
·  FreeBSD Tips
·  *niX Forums

  man pages->Tru64 Unix man pages -> LANG (5)              



NAME    [Toc]    [Back]

       i18n_intro,  i18n,  LANG,  LC_ALL,  LC_COLLATE,  LC_CTYPE,
 to internationalization (I18N)

DESCRIPTION    [Toc]    [Back]

       Internationalization  refers  to the process of developing
       programs without prior knowledge of the language, cultural
       data,  or character-encoding schemes that the programs are
       expected to handle. In other  words,  internationalization
       refers  to the availability and use of interfaces that let
       programs modify their behavior at run time  for  operation
       in a specific language environment.  The abbreviation I18N
       is often used to stand for internationalization, as  there
       are 18 characters between the beginning "I" and the ending
       "N" of that word.

       The I18N interfaces and utilities provided with the  operating
  system  conform to Issue 4 of X/Open CAE specifications.

       A concept related to internationalization is  localization
       (L10N), which refers to the process of establishing information
 within a computer system for  each  combination  of
       native  language,  cultural  data, and coded character set
       (codeset). A locale is a database that  provides  information
  for  a unique combination of these three components.
       However, locales do not solve all  of  the  problems  that
       localization  must  address. Many native languages require
       additional support in the form of language-specific  print
       filters,  fonts, codeset converters, character input methods,
 and other kinds of specialized software.

       See the following reference pages for additional introductory
  information  on  topics related to internationalization:
 For more information on localization and locales For
       an  introduction  to  codeset  conversion For a summary of
       printer support for native languages

   Characters, Character Sets, and Codesets
       A character is a member of a set of elements used for  the
       organization, control, or representation of data.

       A character set is a set of alphabetic or other characters
       used to construct the words and other elementary units  of
       a  native  language or computer language.  A character set
       specifies only the characters that  are  included  in  the
       set.  ASCII, CNS 11643 and DTSCS are examples of character

       A coded character set (codeset) is a  set  of  unambiguous
       rules  that  support one or more character sets and establishes
 the one-to-one relationship between each  character
       and its bit representation. In other words, a codeset consists
 of the code points for characters  in  one  or  more
       character  sets.  For  example,  DEC Hanyu (dechanyu) is a
       codeset for Chinese and contains code points  for  characters
  in  the ASCII, CNS 11643-1986 (plane 1 and plane 2),
       and DTSCS character sets.

   Language Announcement (Setting Locale)    [Toc]    [Back]
       Language announcement is the mechanism by which  language,
       cultural data, and codeset requirements are set either for
       the system as a whole or by individual users. An  application
  can also set these requirements, although it is more
       common for an internationalized  application  to  use  the
       setting  in  effect for the user who runs the program. See
       the System Administration  manual  for  information  about
       setting  systemwide  defaults for shells. See setlocale(3)
       and Writing Software  for  the  International  Market  for
       information  on  how  applications  query  or  set  locale
       requirements at run time.

       Language announcement is performed by setting one or  more
       reserved environment variables to the name of an installed
       locale. Each  locale  has  associated  with  it  collating
       sequences,  character conversion tables, character classification
 tables, formats for different kinds of data,  and
       message  catalogs.  If the same locale meets user requirements
 in all these categories, set only the LANG  environment
  variable  to  the locale name. A locale name usually
       has the following format:


       Where  language  represents  the  human  language  of  the
       locale,  territory  represents  a  geographic  country  or
       region, codeset is the coded character  set  used  in  the
       locale, and the optional @modifier suffix represents additional
 information for localization of data.

       The following Korn shell example sets  LANG  to  a  locale
       supporting  the  English  language, United States cultural
       data, and ISO8859-1 codeset: $ LANG=en_US.ISO8859-1

       The following C shell example sets LANG to a  locale  supporting
  the  Traditional Chinese language, Hong Kong cultural
 data, and the  DEC  Hanyu  codeset:  %  setenv  LANG

       Locale  name  formats  can vary from vendor to vendor. Use
       the locale -a command to  display  the  names  of  locales
       installed on your system.  See l10n_intro(5) for a list of
       the locales provided with the Tru64 UNIX product.

       An alternative way to  set  locale  requirements  for  all
       locale  categories  is to set the LC_ALL environment variable.
 The difference between the LANG and LC_ALL variables
       is  that  LC_ALL  is a high-precedence variable that overrides
 all other locale variables, including LANG. The LANG
       variable, on the other hand, is a low-precedence variable.
       When used by itself, the LANG variable implicitly sets all
       locale  categories  to the specified locale just as LC_ALL
       does. However, the LANG variable can be used together with
       variables  for specific locale categories to create a multilocale
 environment.  The category-specific locale  variables
 and what they control follow: String collation Character
 classification Translations for messages  and  valid
       strings  for  "yes" and "no" responses The currency symbol
       and the format of monetary values The  format  of  numeric
       values The format of date and time values

              A  locale can support only one set of date and time
              formats; however, there can be several sets of date
              and  time  formats in use for a particular language
              and territory. See  l10n_intro(5)  for  information
              about  creating a site-specific version of a locale
              to support date and  time  formats  different  from
              those supported by an installed locale.

       The  operating system provides dense code locales and Unicode
  locales.    Unicode   locales   are   installed   in
       /usr/i18n/lib/nls/ucsloc/.    Dense   code   locales   are
       installed in /usr/i18n/lib/nls/loc/.  The Unicode  locales
       enable  consistent wchar_t values across locales and platform
 interoperability. The system administrator, as  root,
       can  define  the  systemwide default as Unicode locales or
       dense  code  locales  by  changing   the   symbolic   link
       /usr/i18n/lib/nls/dloc/  from  to l10n_intro(5) for a more
       information on the Unicode locales and  switching  between
       Unicode  and  dense code. See Unicode(5) for more information
 about UCS-4 and UTF-8 formats.

       Unicode locales, with a UTF-8 suffix, use  UTF-32  as  the
       internal process code and UTF-8 as the file format.

       The  operating system also includes a complete set of nonUTF-8
 Unicode locales  in  /usr/i18n/lib/nls/ucsloc/  that
       provide UTF-32 internal process code for applications that
       require file code in the format of the traditional UNIX or
       a proprietary codeset.

       A  @modifier suffix indicates locale variants that support
       alternative rules for collation in Asian  languages.   Use
       locales  with these suffixes only when setting LC_COLLATE.
       For example,  three  different  sets  of  collation  rules
       (chuyin,  radical, and stroke) can be used with the locale
       supporting the Chinese language, Taiwanese cultural  data,
       and the Taiwanese EUC codeset. If Korn shell users want to
       use this locale, they might make the following settings: $
       LANG=zh_TW.eucTW $ LC_COLLATE=zh_TW.eucTW@stroke

       The  preceding example implicitly sets all locale category
       variables to zh_TW.eucTW, except for the LC_COLLATE  variable,
  which  is  set to zh_TW.eucTW@stroke. The following
       locale command displays the variable settings after  these

       $  locale  LANG=zh_TW.eucTW  LC_COLLATE=zh_TW.eucTW@stroke
       LC_CTYPE="zh_TW.eucTW"           LC_MONETARY="zh_TW.eucTW"
       LC_NUMERIC="zh_TW.eucTW"   LC_TIME="zh_TW.eucTW"   LC_MESSAGES="zh_TW.eucTW"

SEE ALSO    [Toc]    [Back]

       Commands: locale(1), setlocale(3)

       Others: i18n_printing(5),  iconv_intro(5),  l10n_intro(5),

       Writing Software for the International Market

       Using International Software

       System Administration

[ Back ]
 Similar pages
Name OS Title
gettextize Linux add internationalization files to your project
patterns Tru64 Patterns for use with internationalization tools
wwpsof Tru64 Generic I18N (internationalized) print filter for PostScript printers
perllocale IRIX Perl locale handling (internationalization and localization)
perllocale OpenBSD Perl locale handling (internationalization and localization)
charsets Linux programmer's view of character sets and internationalization
glintro IRIX Introduction to OpenGL
intro HP-UX introduction to miscellany
rcsintro Tru64 introduction to RCS commands
intro Tru64 Introduction to commands
Copyright © 2004-2005 DeniX Solutions SRL
newsletter delivery service