Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 1 Mar 2001 20:59:43 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        areilly@bigpond.net.au (Andrew Reilly)
Cc:        keichii@peorth.iteration.net (Michael C . Wu), tlambert@primenet.com (Terry Lambert), jonathan@graehl.org (Jonathan Graehl), freebsd-arch@FreeBSD.ORG (freebsd-Arch), i18n@FreeBSD.ORG
Subject:   Re: Unicode, command line options, and configuration files, oh my!
Message-ID:  <200103012059.NAA05439@usr05.primenet.com>
In-Reply-To: <20010301174513.A65013@gurney.reilly.home> from "Andrew Reilly" at Mar 01, 2001 05:45:13 PM

next in thread | previous in thread | raw e-mail | index | archive | help
> > | In general, this means that for Unicode data stored for
> > | directory entries would require that a directory entry
> > | block would have to be 512b, whereas for UTF-8, we are
> > | talking 2048b (2k).
> 
> It would still have to be larger than 512b using a 16-bit
> encoding, wouldn't it?

Yes; 1024b; sorry about that, it was an error.  The point was
supposed to be that, if you go look at the directory entry code,
it would be a lot easier to implement 1k instead of 2k (we did
this before when we ported the FreeBSD VFS to Windows 95 and
supported both the 256 character Unicode and the 8.3 namespaces
simultaneously).


> > | If the same approach is used as the current UFS code uses,
> > | then these operations will need to be directory entry block
> > | atomic.
> > 
> > In short, we can save the file name that the user sees 
> > with the file data.  The filesystem and the kernel sees
> > some other naming scheme determined by the FS/kernel.
> 
> How do you propose to do that and still maintain Unix inode/link
> semantics?  There isn't (necessarily) only one file name that
> the user sees, but there _is_ only one lump of file data.

How do hard links work at all today, under the same conditions?

The directory entry is just a reference to the inode; this is
not like ISO or VFAT, where the directory entry _is_ the inode.


> > | On top of that, we have Microsoft and Java interoperability to
> > | consider, distasteful as that may be to some.
> > 
> > M$ has a pretty good implementation here.
> > Java I18N sucks really bad.
> 
> Could you give a quick description of why one of these is good
> and the other bad, for the bennefit of someone who knows
> neither?

My take on this, which may not be the same as his, is that the
Microsoft implementation uses the processing representation as
the storage representation, whereas Java uses UTF-8 for the
storage representation.

Java also deals in strings composed of "bytes" instead of strings
composed of "characters", which makes string processing problematic,
if the string is an I18N string; consider that it has no functions
similar to XPG/4 mbtowc() or other interning/externing functions
that it would use to deal with them.

It's kind of like the problem with Java letting you instance
objects without a default constructor being required to make
them valid; the JavaMail API is rife with examples of this type
of thing.  You can see it pretty easily, when you try to write
those same interfaces in C++, since C++ doesn't permit that
sort of thing to happen (instancing without initialization is
not possible in C++; there is *always* a default constructor).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-i18n" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200103012059.NAA05439>