Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 01 Sep 2000 18:21:58 +0100
From:      Konstantin Chuguev <Konstantin.Chuguev@dante.org.uk>
To:        "Andrey A. Chernov" <ache@nagual.pp.ru>
Cc:        Boris Popov <bp@butya.kz>, freebsd-arch@FreeBSD.ORG, freebsd-i18n@FreeBSD.ORG
Subject:   Re: Proposal to include iconv library in the base system.
Message-ID:  <39AFE5B6.1F418EDD@dante.org.uk>
References:  <Pine.BSF.4.10.10008241719320.80086-100000@lion.butya.kz> <20000901185945.A29804@nagual.pp.ru> <39AFD666.880FE6C@dante.org.uk> <20000901205825.A30569@nagual.pp.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
"Andrey A. Chernov" wrote:

> On Fri, Sep 01, 2000 at 05:16:38PM +0100, Konstantin Chuguev wrote:
> >    * new filesystems use Unicode encodings: UCS-2 (Windows), some may use
> >      UTF-8. These encodings are not supported by XLAT.
>
> I assume Windows (Unicode) <-> 8bit charset tables are loadable
> too. Doesn't?
>

Yes, they are. iconv always loads 2 CES modules for conversion. A CES module can
load 0 or more CCS modules. Let me show you a few examples:

If we are converting from koi8-r to UCS-2 and/or the other way around, there
will be 3 modules loaded:
(CES) _tbl_simple -> (CCS) koi8-r
(CES) ucs-2

For conversion between koi8-r and windows-1251, there will be 3 modules again:
(CES) _tbl_simple -> (CCS) koi8-r
(CES) _tbl_simple -> (CCS) windows-1251
(Note, that only one instance of the _tbl_simple module will be loaded, as
modules are shareable; there might be two different small sets of structures
allocated for each CES->CCS binding at iconv_open time, and freed at iconv_close
time.)

For conversion between UTF-8 and EUC-JP 6 modules are required:
(CES) utf-8
(CES) euc-jp -> (CCS) us-ascii
             -> (CCS) jis_x0208-1983
             -> (CCS) jis_x0201
             -> (CCS) jis_x0212-1990

To convert between EUC-JP and ISO-2022-JP we need 6 modules:
(CES) euc-jp -> (CCS) us-ascii
             -> (CCS) jis_x0208-1983
             -> (CCS) jis_x0201
             -> (CCS) jis_x0212-1990
(CES) iso-2022-jp -> (CCS) us-ascii
                  -> (CCS) jis_x0208-1983
                  -> (CCS) jis_x0201
                  -> (CCS) jis_x0212-1990
Again, all CCS modules will be shared.


>
> > Exactly, this is what was intended. All [UNIX] charsets supported in the
> > FreeBSD distribution (i.e. which are present in the locale directory) PLUS
> > charsets used in other types of filesystems (Windows, Netware?, MacOS?) for
>
> Currently we support Windows and ISO 9660 for CDs, so PLUS Windows
> (Unicode) and ISO 9660 charsets.
>
> If we take Russian example, we need following tables (for kernel only):
>
> 1) KOI8-R <-> CP866 for MSDOS FS
> 2) KOI8-R <-> Unicode for Windows FS

What about Windows < 95? I'm sure people used localized file names there too.
And it was not Unicode.

>
> 3) We also need ISO 9660 conversion scheme, but I not know about
> character set used there.
>

Can anybody please give me a reference to ISO 9660 specification. I would also
like to know which IBM charsets are used in MSDOS FS for languages other than
Russian (and supported in FreeBSD).


--
          * *        Konstantin Chuguev - Application Engineer
       *      *              Francis House, 112 Hills Road
     *                       Cambridge CB2 1PQ, United Kingdom
 D  A  N  T  E       WWW:    http://www.dante.net





To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39AFE5B6.1F418EDD>