Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 27 Jun 1999 23:17:19 +0900
From:      Motoyuki Konno <motoyuki@snipe.rim.or.jp>
To:        "Jordan K. Hubbard" <jkh@zippy.cdrom.com>
Cc:        Motoyuki Konno <motoyuki@snipe.rim.or.jp>, Nik Clayton <nclayton@lehman.com>, Jun Kuriyama <kuriyama@sky.rim.or.jp>, doc@FreeBSD.ORG, freebsd-translate@ngo.org.uk, jdp@FreeBSD.ORG
Subject:   Re: Resolution: FDP reorganisation 
Message-ID:  <199906271417.XAA06581@rei.snipe.rim.or.jp>
References:  <67622.930333696@zippy.cdrom.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

"Jordan K. Hubbard" <jkh@zippy.cdrom.com> wrote:
> OK, so the Japanese folks have some sort of auto-conversion.  That
> takes care of strictly the Japanese language, but what about the
> Chinese folks or the others that Nik pointed out?  It seemed to me
> that he was looking for a much wider convention here, not just a
> solution to the ja problem.

If you want to know more about this,  please read Ken Lunde's book
"CJKV Information Processing", from O'Reilly. 

#  CJKV means Chinese, Japanese, Korean & Vietnamese.


--------------------

For General:

  ISO-2022:  ISO-2022 is a '7 bit encoding method', because all
             characters do not have their 8 bit enabled.
             So, ISO-2022 encoding is very useful for e-mail, netnews.
             
  EUC:  EUC is short from 'Extended UNIX code'.


Japanese
--------

character set  :  JIS X 0208
encoding system:  JIS, SJIS, EUC-JP

    o  JIS    : also known as 'ISO-2022-JP', used for e-mail,
                netnews.  ISO-2022-JP is defined in RFC 1922.

    o  SJIS   : short from 'Shift JIS'.  DOS/Windows computers
                 and Macintosh use SJIS as internal code.

    o  EUC-JP : most UNIX computers use EUC-JP as internal code.

    conversion between JIS, SJIS an EUC-JP is very easy.


Korean
------

character set   :  KS X 1001
encoding system :  ISO-2022-KR, EUC-KR

    o  ISO-2022-KR :  defined in RFC 1557. similar to ISO-2022-JP for
                      Japanese.
    o  EUC-KR      :  similar to EUC-JP for Japanese.
                      I have heard that many Korean people use EUC-KR
                      for e-mail, not ISO-2022-KR.

Chinese Taiwan
--------------

character set   :  CNS 11643 (traditional Chinese characters)
                   also known as 'Big5' (*1).
encoding system :  ISO-2022-CN (*2), EUC-TW, Big5

   o  ISO-2022-CN :  defined in RFC 1922.
   o  EUC-TW      :  similar to EUC-JP for Japanese.
   o  Big5        :  Big5 encoding suports more characters than EUC-TW.
                     Ken Lunde says 'It seems a bit silly to compare
                     Big Five and EUC-TW encodings because they are
                     so different from one another' in his 'CJKV' book.

Chinese Mainland
----------------

character set   :  GB 2312 (simplified Chinese characters)
encoding system :  ISO-2022-CN (*2), EUC-CN, GBK

   o  ISO-2022-CN :  see the section 'Chinese Taiwan'.
   o  EUC-CN      :  similar to EUC-JP for Japanese.
   o  GBK         :  Windows computers use GBK as internal code.
                     EUC-CN is a subset of GBK.


*1:  To be exact,  CNS 11643 is corrected and supplemented version
     of 'Big5'

*2:  ISO-2022-CN supports both CNS (Taiwan) and GB (Chinese Mainland)
     character sets.

--
------------------------------------------------------------------------
Motoyuki Konno                  mkonno@res.yamanashi-med.ac.jp   (Univ)
                                motoyuki@snipe.rim.or.jp         (Home)
                                motoyuki@FreeBSD.ORG  (FreeBSD Project)
Yamanashi Medical University    http://www.freebsd.org/~motoyuki/ (WWW)


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199906271417.XAA06581>