From owner-freebsd-hackers Wed Jun 10 17:16:55 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id RAA28768 for freebsd-hackers-outgoing; Wed, 10 Jun 1998 17:16:55 -0700 (PDT) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from gershwin.tera.com (gershwin.tera.com [207.224.230.28]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id RAA28716 for ; Wed, 10 Jun 1998 17:16:43 -0700 (PDT) (envelope-from kline@tao.thought.org) Received: from tao.thought.org (tao.tera.com [207.108.223.55]) by gershwin.tera.com (8.8.8/8.8.8) with ESMTP id RAA00745; Wed, 10 Jun 1998 17:15:55 -0700 (PDT) Received: (from kline@localhost) by tao.thought.org (8.8.8/8.7.3) id RAA09151; Wed, 10 Jun 1998 17:15:33 -0700 (PDT) From: Gary Kline Message-Id: <199806110015.RAA09151@tao.thought.org> Subject: Re: internationalization In-Reply-To: <199806102155.OAA13862@usr01.primenet.com> from Terry Lambert at "Jun 10, 98 09:55:44 pm" To: tlambert@primenet.com (Terry Lambert) Date: Wed, 10 Jun 1998 17:15:33 -0700 (PDT) Cc: hackers@FreeBSD.ORG Organization: <> thought.org: public access uNix in service... <> X-Mailer: ELM [version 2.4ME+ PL32 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG According to Terry Lambert: [[ ... ]] > > I'm going ahead with my current implementation and look forward to > > hearing from any other hackers who are interested in this. ---I've been doing further digging since your mail. This black-hole keeps getting more interesting.... > > I'm interested. > > Part of the problem here is that FreeBSD doesn't fully support XPG/4. > > Another part of the problem is that XPG/4 is encoded multibyte, which > is bad from a number of major perspectives, starting with ISO2022. We've got v 2.0 of the xpg4 library in 2.2.6. Do you know if any other flavor of BSD has more complete support? > > I would prefer going to a full-on Unicode implementation to support > all known human languages. > This was my first leaning, but I'm increasingly going toward the ISO families. > I would suggest an initial 16 bit wchar_t with an assumption of a > zero valued code page designator. If ISO ever gets around to adding > other code pages, we can deal with that at that time using page > selection. Meanwhile, we'll be able to interportate with Microsoft > and JAVA, which use 16 bit wchar_t encodings. > > > I think the first (and hardest) step is the shells. The shells need > to be internationalized based on the fact that they (can) intrpret > exit codes to the user as error messages. Exit codes, certainly; but where you've got syserror() output, that's another issue. Agree that the shells are the base. csh|tcsh, and the sh|ksh group. > > The last time I converted csh, this was absolute hell because the > code was badly organized for internationalization. > > The next hardest step is the editors, starting with "vi". They have > to be able to support Unicode. nvi/nex already have been tweaked for 8-bit international support. I learned this accidently. WAs quite surprised to see messages in French and German. :-) Nonetheless, I see why you like the Unicode solution. Someone said, ``Well, French support is great, but how are you going to handle Japanese?'' > > I have had FS-based Unicode support working for a very long time, > though it has failed to be committed. One big issue is that directory > entry blocks must grow from 512b to 1k. This has a number of > implications to the soft updates work currently in progress. This is > because, in order to support a maximally sized path component, 512 + 24 > bytes is needed for unicaode, as opposed to 256 + 24 (which fits in 512b) > for an 8 bit charaacter set. :-( ! How does the ISO2022 model work here? Isn't it the same for Japanese and Chinese? > > If we were to do something stupid, like UTF-7 or UTF-8, it would have > to grow to 5 * 256 + 24, minimally, to support 5:1 character expansion > possible, as opposed to the 2:1 of flat Unicode encoding. You've lost me here. What does the translation format do, or rather, how? > > For character set attributed FS's (like NFS v2/v3 will have to be), you > can do the translation in in the kernel on the blocks on their way out > (a 2:1 expnasion in memory of a 1:1 disk image for a given ISO character > set attribution for the filesystem). > > Thanks for your feedback. It's probably a good idea to consider the broader design issues now than to paint myself into a corner. gary To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message