iconv_ibmkanji

· Home

+ man pages

-> Linux

-> FreeBSD

-> OpenBSD

-> NetBSD

-> Tru64 Unix

-> HP-UX 11i

-> IRIX

· Linux HOWTOs

· FreeBSD Tips

· *niX Forums

man pages->Tru64 Unix man pages -> iconv_ibmkanji (5)

iconv_ibmkanji(5)

NAME
DESCRIPTION
NOTES
SEE ALSO

NAME [Toc] [Back]

       iconv_ibmkanji  - Specification for controlling conversion
       between IBM Kanji and Tru64 UNIX Japanese codesets

DESCRIPTION [Toc] [Back]

       The iconv utility supports  the  ability  to  convert  the
       encoding of characters between IBM Kanji System Characters
       (IBM Kanji) and one of the following Tru64 UNIX  codesets:
       DEC  Kanji,  Super  DEC Kanji, Japanese EUC, or Shift JIS.
       You choose the type of conversion by specifying the appropriate
  values  for  the  utility's  from-code and to-code
       parameters, as follows:

       -----------------------------------------------------
       Type of Code Conversion        from-code   to-code
       -----------------------------------------------------
       IBM Kanji to DEC Kanji         ibmkanji    deckanji
       IBM Kanji to Super DEC Kanji   ibmkanji    sdeckanji
       IBM Kanji to Japanese EUC      ibmkanji    eucJP
       IBM Kanji to Shift JIS         ibmkanji    SJIS
       DEC Kanji to IBM Kanji         deckanji    ibmkanji
       Super DEC Kanji to IBM Kanji   sdeckanji   ibmkanji
       Japanese EUC to IBM Kanji      eucJP       ibmkanji
       Shift JIS to IBM Kanji         SJIS        ibmkanji
       -----------------------------------------------------

       Conversion behavior for the following items is affected by
       the definition of environment variables or profile entries
       in the user's environment. For more information,  see  the
       "Environment  Variables"  and "Profile" sections.  The UDC
       (User-Defined Character) mapping table that  is  used  for
       UDC conversion

              This table must be an ASCII text file that contains
              UDC mapping information.  The table affects conversion
  of  user-defined characters between the codesets.
  The EBCDIC  to/from  ISO  code  (ASCII,  JIS
              Roman  characters)  mapping  table that is used for
              conversion

              This table must be ASCII text  file  that  contains
              information on how to map characters between EBCDIC
              and ISO code.  The K-shift code

              This is a one- or two-byte  hexadecimal  code  that
              marks  the  beginning  of  Kanji mode.  The A-shift
              code

              This is a one- or two-byte  hexadecimal  code  that
              marks  the beginning of EBCDIC mode.  The status of
              the initial mode (Kanji  or  EBCDIC)  at  the  time
              iconv  command starts or the first time the iconv()
              function is called after calling  the  iconv_open()
              function  that  initializes the converter in a program


              The  status  keywords  are  either  kanji_mode   or
              ebcdic_mode.   How  to  treat  undefined characters
              when these are detected in Kanji mode

              Specify this action by using one of  the  following
              keywords:  Stop  codeset  conversion.   Output  the
              undefined characters  without  any  processing  and
              continue  codeset conversion.  Output padding characters
 instead of the undefined characters and continue
  codeset  conversion.   Ignore  the undefined
              characters and continue  codeset  conversion.   The
              two-byte padding character used in Kanji mode

              This value is meaningful when replace is chosen for
              the processing of  undefined  characters  in  Kanji
              mode. Specify the padding character by its hexadecimal
 value.  How to treat undefined characters when
              these are detected in EBCDIC mode

              Specify  this  action by using one of the following
              keywords:  Stop  codeset  conversion.   Output  the
              undefined  characters  without  any  processing and
              continue codeset conversion.  Output padding  characters
 instead of the undefined characters and continue
 codeset  conversion.   Ignore  the  undefined
              characters  and  continue  codeset conversion.  The
              one-byte padding character used in EBCDIC mode

              This value is meaningful when replace is chosen for
              the  processing  of  undefined characters in EBCDIC
              mode. Specify the padding character by its hexadecimal
 value.

       When the to-code parameter for the conversion is ibmkanji,
       you can also specify the following  items  for  conversion
       behavior:  Whether the initial shift code is output at the
       start of conversion if the  status  of  the  initial  mode
       (Kanji  or EBCDIC) is different from the mode of the first
       input character

              The start of conversion is the time the iconv utility
 starts processing, or when the iconv() function
              is called just after  opening  the  converter  with
              iconv_open().  Keyword values for this item are yes
              or no.  Whether or not the utility outputs the last
              shift  code  when  iconv()  is  called  with a zero
              length input string, and the current mode (Kanji or
              EBCDIC) is different from the mode specified by the
              last shift state

              Keyword values for this item are yes  or  no.   The
              last status (Kanji mode or EBCDIC mode)

              Specify  kanji_mode  or ebcdic_mode for this value.
              It is meaningful only when yes is the  setting  for
              whether the utility outputs the last shift code.

       If  the  items that control conversion behavior are specified
 by both environment variables and the  profile  file,
       values set by environment variables override values set by
       comparable entries in the profile. Note  that  values  for
       all  conversion  control items are case-sensitive, whether
       they are set by environment variables or in  the  profile.
       The  following  table contains the default values for each
       conversion control item:







       ----------------------------------------------------
       Conversion Control Item               Default Value
       ----------------------------------------------------
       UDC mapping table                     None
       K shift code                          0x0e
       A shift code                          0x0f
       Initial state                         ebcdic_mode
       Processing for undefined characters
       in Kanji mode                         abort
       Processing for undefined characters
       in EBCDIC mode                        pass
       ----------------------------------------------------

       The default padding characters  are  white  spaces,  whose
       code  values for each destination codeset are noted in the
       following table. These padding characters are output  when
       you specify replace for processing of undefined characters
       and do not explicitly specify the padding character.

       ---------------------------------------------------
       Mode          Default Value   Destination Codeset
       ---------------------------------------------------
       Kanji mode    0x44e9          ibmkanji
                     0xa1a1          deckanji, sdeckanji,
                                     or eucJP
                     0x8140          SJIS
       EBCDIC mode   0x40            ibmkanji
                     0x20            deckanji, sdeckanji,
                                     eucJP, or SJIS
       ---------------------------------------------------

       The default EBCDIC-ISO mapping table is  as  follows;  For
       conversion    from    IBM   Kanji   to   other   codesets:
       /usr/lib/nls/loc/iconv/data/ebcdic_kana.tbl For conversion
       from      other      codesets      to      IBM      Kanji:
       /usr/lib/nls/loc/iconv/data/kana_ebcdic.tbl

       These mapping tables map both EBCDIC and ISO  code,  which
       includes JIS Roman characters. The kana_ebcdic.tbl mapping
       table also maps ISO lowercase characters to EBCDIC  uppercase
 characters.

       The  following default values for conversion control items
       are meaningful when the iconv utility's to-code conversion
       parameter is ibmkanji:

       ---------------------------------------------
       Conversion Control Item          Default
       ---------------------------------------------
       Output the initial shift code?   yes
       Output the last shift code?      yes
       Output the last status?          ebcdic_mode
       ---------------------------------------------


   Environment Variables    [Toc]    [Back]
       This  section discusses the environment variables that you
       can set to control  conversion  behavior.  The  names  for
       these variables adhere to the following format:

       fromcode_tocode_controlitem

       The name segments for fromcode or tocode can be one of the
       following key words:


       ----------------------------
       For Codeset:      Use:
       ----------------------------
       IBM Kanji         IBMKANJI
       DEC Kanji         DECKANJI
       Super DEC Kanji   SDECKANJI
       Japanese EUC      EUCJP
       Shift JIS         SJIS
       ----------------------------

       The name segments for controlitem can be one of  the  following
 keywords:

       --------------------------------------------------------
       For Control Item:                    Use:
       --------------------------------------------------------
       UDC mapping table                    UDC_TABLE
       EBCDIC-ISO mapping table             EBCDIC_TABLE
       K shift code                         K_SHIFT_CODE
       A shift code                         A_SHIFT_CODE
       Initial state                        INITIAL_STATE
       Processing of undefined characters
       in Kanji mode                        KANJI_EXCEPT_PROC
       Processing of undefined characters
       in EBCDIC mode                       EBCDIC_EXCEPT_PROC
       Padding characters
       in Kanji mode                        PADDING_2BYTE_CHAR
       Padding characters
       in EBCDIC mode                       PADDING_1BYTE_CHAR
       Output initial
       shift code                           INITIAL_SHIFT_CODE
       Output last
       shift code                           TRAILER_SHIFT_CODE
       Last status                          LAST_STATE
       File path of the profile             PROFILE
       --------------------------------------------------------

       Following are examples of using the setenv C shell command
       to define  environment  variables  to  control  conversion
       behavior.  In  these  examples,  the fromcode name segment
       indicates Japanese EUC and the tocode name  segment  indicates
 IBM Kanji:

       setenv   EUCJP_IBMKANJI_UDC_TABLE   eucjp_ibmkanji_udc.tbl
       setenv EUCJP_IBMKANJI_EBCDIC_TABLE kana_ebcdic.tbl  setenv
       EUCJP_IBMKANJI_K_SHIFT_CODE           0x0e          setenv
       EUCJP_IBMKANJI_A_SHIFT_CODE          0x0f           setenv
       EUCJP_IBMKANJI_INITIAL_STATE       ebcdic_mode      setenv
       EUCJP_IBMKANJI_KANJI_EXCEPT_PROC      replace       setenv
       EUCJP_IBMKANJI_EBCDIC_EXCEPT_PROC      replace      setenv
       EUCJP_IBMKANJI_PADDING_2BYTE_CHAR      0x44e9       setenv
       EUCJP_IBMKANJI_PADDING_1BYTE_CHAR        0x40       setenv
       EUCJP_IBMKANJI_INITIAL_SHIFT_CODE        yes        setenv
       EUCJP_IBMKANJI_TRAILER_SHIFT_CODE        yes        setenv
       EUCJP_IBMKANJI_LAST_STATE        ebcdic_mode        setenv
       EUCJP_IBMKANJI_INITIAL_SHIFT_CODE        yes        setenv
       EUCJP_IBMKANJI_TRAILER_SHIFT_CODE        yes        setenv
       EUCJP_IBMKANJI_LAST_STATE        ebcdic_mode        setenv
       EUCJP_IBMKANJI_PROFILE .eucjp_ibmkanji_profile








   Directory Search Path    [Toc]    [Back]
       When you specify a file  name  without  a  directory,  the
       iconv  utility searches the following directories and uses
       the first file found: Current directory Home directory The
       iconv/data  subdirectory of the directory specified by the
       environment variable  LOCPATH  /usr/lib/nls/loc/iconv/data
       /usr/i18n/lib/nls/loc/iconv/data

       If  you  specify a relative directory path for a file, the
       utility searches these same directories in the same  order
       and uses the first file found.

   Profile File    [Toc]    [Back]
       Entry  lines  in  the profile file adhere to the following
       format:

       entry_name        string_value

       The entry_name and string_value fields  are  separated  by
       spaces   or   tabs.  Do  not  append  a  colon  (:)  after
       entry_name. The file can also include blank lines and comment
 entries, which begin with the # character.

       Following  are the entry_name values for different conversion
 control items:

       ------------------------------------------------------------
       Conversion Control Item           entry_name
       ------------------------------------------------------------
       UDC mapping table                 udc_mapping_table
       EBCDIC-ISO mapping table          ebcdic_mapping_table
       K shift code                      k_shift_code
       A shift code                      a_shift_code
       Initial state                     initial_state
       Processing undefined characters
       in Kanji mode                     kanji_except_proc
       Processing undefined characters
       in EBCDIC mode                    ebcdic_except_proc
       Padding character
       in Kanji mode                     padding_2byte_char
       Padding character
       in EBCDIC mode                    padding_1byte_char
       Output initial
       shift code                        output_initial_shift_code
       Output last
       shift code                        output_trailer_shift_code
       Last state                        last_state
       ------------------------------------------------------------

       Following is a sample profile for converting from Japanese
       EUC to IBM Kanji.

       # #  sample profile for eucJP_ibmkanji # udc_mapping_table
       eucjp_ibmkanji_udc.tbl                ebcdic_mapping_table
       kana_ebcdic.tbl  k_shift_code               0x0e         #
       ebcdic -> kanji a_shift_code               0x0f          #
       kanji  ->  ebcdic  initial_state               ebcdic_mode
       kanji_except_proc            replace    ebcdic_except_proc
       replace  padding_2byte_char          0x44e9        # kanji
       mode padding_1byte_char         0x40         # ebcdic mode
       output_initial_shift_code   yes  output_trailer_shift_code
       yes last_state                 ebcdic_mode

       The default file names for the profile are as follows;

       -----------------------------------------------------------
       Code Conversion                Default Profile Name
       -----------------------------------------------------------

       IBM Kanji to DEC Kanji         .ibmkanji_deckanji_profile
       IBM Kanji to Super DEC Kanji   .ibmkanji_sdeckanji_profile
       IBM Kanji to Shift JIS         .ibmkanji_sjis_profile
       IBM Kanji to Japanese EUC      .ibmkanji_eucjp_profile

       DEC Kanji to IBM Kanji         .deckanji_ibmkanji_profile
       Super DEC Kanji to IBM Kanji   .sdeckanji_ibmkanji_profile
       Shift JIS to IBM Kanji         .sjis_ibmkanji_profile
       Japanese EUC to IBM Kanji      .eucjp_ibmkanji_profile
       -----------------------------------------------------------

       By default, the iconv utility checks the directory  search
       path  mentioned in the "Directory Search Path" section and
       uses the first profile it finds.  However,  you  can  also
       specify an arbitrary file path for your profile instead of
       the default names by defining  the  following  environment
       variables:

       -----------------------------------------------------------------
       Code Conversion                Profile Path Environment Variable
       -----------------------------------------------------------------
       IBM Kanji to DEC Kanji         IBMKANJI_DECKANJI_PROFILE
       IBM Kanji to Super DEC Kanji   IBMKANJI_SDECKANJI_PROFILE
       IBM Kanji to Shift JIS         IBMKANJI_SJIS_PROFILE
       IBM Kanji to Japanese EUC      IBMKANJI_EUCJP_PROFILE

       DEC Kanji to IBM Kanji         DECKANJI_IBMKANJI_PROFILE
       Super DEC Kanji to IBM Kanji   SDECKANJI_IBMKANJI_PROFILE
       Shift JIS to IBM Kanji         SJIS_IBMKANJI_PROFILE
       Japanese EUC to IBM Kanji      EUCJP_IBMKANJI_PROFILE
       -----------------------------------------------------------------


   UDC Mapping Table    [Toc]    [Back]
       Entries  in  a  UDC  mapping table adhere to the following
       format:

       fromcode      tocode

       Each of these values is a two-byte hexadecimal number.  In
       the  case  of Super DEC Kanji and Japanese EUC, three-byte
       hexadecimal values that begin with  SS3  (0x8f),  such  as
       0x8fxxxx, are also valid.

       You  can  specify  ranges of UDC from and to values in the
       same file entry by using a hyphen to  separate  the  codes
       that start and end each range:

       start_fromcode-end_fromcode   start_tocode-end_tocode

       When specifying entries that include ranges of values, the
       number of codes in the from range must  always  equal  the
       number  of  codes in the to range. A UDC mapping table can
       also include blank lines and comment  lines,  which  begin
       with  the  #  character.  Following is an example of a UDC
       mapping table:

       # ibmkanji            eucJP

       0x6941-0x72fe           0xf5a1-0xfefe             #    udc
       0x7341-0x7cfe            0x8ff5a1-0X8ffefe         #   udc
       0x7d41-0x7ffe         0x8feea1-0X8ff0fe       # udc

       The first entry in this file  specifies  a  range  of  IBM
       Kanji  values  from  0x6941  to  0x72fe that are mapped to
       Japanese EUC code values in the range  0xf5a1  to  0xfefe.
       You  can find additional sample UDC mapping table files in
       the /usr/i18n/examples/iconv/data directory.

   EBCDIC-ISO Mapping Table    [Toc]    [Back]
       Entries in an EBCDIC-ISO mapping table adhere to the  following
 format:

       fromcode       tocode

       Each  code is a one-byte hexadecimal number. You can specify
 a range of character codes as follows:

       start_fromcode-end_fromcode     start_tocode-end_tocode

       When using the range format, the number of hex  values  in
       the  from range must be the same as the number of hex values
 in the to range.

       The EBCDIC-/ISO mapping table can also include blank lines
       and comment entries, which begin with the # character.

       Following is an example of EBCDIC-ISO code mapping table:

       # EBCDIC                Kana

       0x40                      0x20              #  space  0x4f
       0x21              #   '!'   0x7f                      0x22
       # '"'
         .                       .
         .                       .
         .                                 .            0xc1-0xc9
       0x41-0x49           #     'A'     -     'I'      0xd1-0xd9
       0x4a-0x52            #     'J'     -     'R'     0xe2-0xe9
       0x53-0x5a       # 'S' - 'Z'
         .                       .
         .                       .
         .                       .

       In this example, the first column of values are from codes
       and  the  second column of values are to codes.  The first
       three value entry lines specify mapping for single characters,
  whereas  the  last  three value entry lines specify
       mapping for ranges of characters.  You can find additional
       sample     EBCDIC-ISO     mapping     tables     in    the
       /usr/i18n/lib/nls/loc/iconv/data directory.

NOTES [Toc] [Back]

       This reference page contains  code  conversion  specifications
 that apply only to conversion between IBM Kanji System
  characters  and  the  DEC  Kanji,  Super  DEC  Kanji,
       Japanese   EUC,   and   Shift   JIS   codesets.  Refer  to
       iconv_JEF(5) for code  conversion  specifications  between
       Fujitsu JEF characters and the DEC Kanji, Super DEC Kanji,
       Japanese  EUC,  and   Shift   JIS   codesets.   Refer   to
       iconv_KEIS(5)  for  code conversion specifications between
       Hitachi KEIS characters  and  the  DEC  Kanji,  Super  DEC
       Kanji,  Japanese  EUC,  and  Shift  JIS codesets. Refer to
       iconv_intro(5) for information  about  conversion  between
       DEC  Kanji,  Super DEC Kanji, Japanese EUC, Shift JIS, and
       other Tru64 UNIX codesets.

iconv_ibmkanji(5)

Contents

NAME [Toc] [Back]

DESCRIPTION [Toc] [Back]

NOTES [Toc] [Back]

SEE ALSO [Toc] [Back]