Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Jun 1998 21:55:44 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        kline@tao.thought.org (Gary Kline)
Cc:        hackers@FreeBSD.ORG
Subject:   Re: internationalization
Message-ID:  <199806102155.OAA13862@usr01.primenet.com>
In-Reply-To: <199806101930.MAA08334@tao.thought.org> from "Gary Kline" at Jun 10, 98 12:30:40 pm

next in thread | previous in thread | raw e-mail | index | archive | help
>   I've been into this twice.  The first time, briefly, in '96, and
>   for the past few weeks.  Generating simple, efficient catalogues 
>   for each of the 200+ utilities is a first task.  
> 
>   There is a related issue of the system errs (currently in sys_errlist[]).
>   There are at least two rational ways to turn::
> 
>   $ ENOENT
>   2 No such file or directory
> 
>   into its French equiv::
> 
>   $ ENOENT
>   2 Fichier ou r\xe9pertoire introuvable
> 
>   I'm going ahead with my current implementation and look forward to
>   hearing from any other hackers who are interested in this.

I'm interested.

Part of the problem here is that FreeBSD doesn't fully support XPG/4.

Another part of the problem is that XPG/4 is encoded multibyte, which
is bad from a number of major perspectives, starting with ISO2022.

I would prefer going to a full-on Unicode implementation to support
all known human languages.

I would suggest an initial 16 bit wchar_t with an assumption of a
zero valued code page designator.  If ISO ever gets around to adding
other code pages, we can deal with that at that time using page
selection.  Meanwhile, we'll be able to interportate with Microsoft
and JAVA, which use 16 bit wchar_t encodings.


I think the first (and hardest) step is the shells.  The shells need
to be internationalized based on the fact that they (can) intrpret
exit codes to the user as error messages.

The last time I converted csh, this was absolute hell because the
code was badly organized for internationalization.

The next hardest step is the editors, starting with "vi".  They have
to be able to support Unicode.

I have had FS-based Unicode support working for a very long time,
though it has failed to be committed.  One big issue is that directory
entry blocks must grow from 512b to 1k.  This has a number of
implications to the soft updates work currently in progress.  This is
because, in order to support a maximally sized path component, 512 + 24
bytes is needed for unicaode, as opposed to 256 + 24 (which fits in 512b)
for an 8 bit charaacter set.

If we were to do something stupid, like UTF-7 or UTF-8, it would have
to grow to 5 * 256 + 24, minimally, to support 5:1 character expansion
possible, as opposed to the 2:1 of flat Unicode encoding.

For character set attributed FS's (like NFS v2/v3 will have to be), you
can do the translation in in the kernel on the blocks on their way out
(a 2:1 expnasion in memory of a 1:1 disk image for a given ISO character
set attribution for the filesystem).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199806102155.OAA13862>