Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 1 Mar 2001 21:14:46 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        keichii@peorth.iteration.net
Cc:        areilly@bigpond.net.au (Andrew Reilly), tlambert@primenet.com (Terry Lambert), jonathan@graehl.org (Jonathan Graehl), asmodai@FreeBSD.ORG, i18n@FreeBSD.ORG
Subject:   Re: Unicode, command line options, and configuration files, oh my!
Message-ID:  <200103012114.OAA06019@usr05.primenet.com>
In-Reply-To: <20010301095049.A10822@peorth.iteration.net> from "Michael C . Wu" at Mar 01, 2001 09:50:49 AM

next in thread | previous in thread | raw e-mail | index | archive | help
> | > | In general, this means that for Unicode data stored for
> | > | directory entries would require that a directory entry
> | > | block would have to be 512b, whereas for UTF-8, we are
> | > | talking 2048b (2k).
> | 
> | It would still have to be larger than 512b using a 16-bit
> | encoding, wouldn't it?
> 
> Yes, and if we are making it larger than 512b, why do we need
> to set a limit on ourselves?

Directory entry block I/O is not handled through the normal
VFS code.  THis is because the directory entry blocks need to
be modified atomically, and FS blocs can span page boundaries;
for a sufficiently large FS block size, frags can exceed the
page size.  For some architectures, the page size is not := 4k.

You need to look at the UFS directory manipulation code in the
/sys/ufs/ufs directory so that you can uderstand the problem;
while you are at it, look at the fsck and newfs and otherFS
utility code which has to deal with directory entry blocks.

It is not pretty.  It would be nearly imposible to do directory
I/O in FS blocks, and keep it atomic.  There is already the risk
of a 1024b directory entry spanning a track boundary, because we
do not read mode page 2 from SCSI, and prohibit track spanning
by FS objects.


> | How do you propose to do that and still maintain Unix inode/link
> | semantics?  There isn't (necessarily) only one file name that
> | the user sees, but there _is_ only one lump of file data.
> 
> Do you see why nobody has been able to solve all this stuff easily?

Wrong; Matt Day, Mark Muhelestein, and myself solved exactly
this problem in exactly the FreeBSD VFS architecture and exactly
the FreeBSD FFS and UFS code back in 1997.

> I think having a journaling filesystem could solve this.

So can UFS/FFS.  Journalling has nothing to do with the underlying
problem here, which is conversion from a fixed length storage to
a variable length storage, where the underlying media has fixed
length blocks into which you have to map things.

Consider a CDROM FS for music and video, running in a file set
up as a device.  The blocks of such an FS could not be aligned
within a page, since they are odd sized.  How do you mmap() an
object in such an FS?


> NTFS gives up the ability to switch charsets in the harddrives.
> (It is a pretty good assumption, since most users stay within
> two languages.)  And most of the userland tools, even the simple ones,
> work with other languages without modifications, when compiled
> by Visual Studio.

The OLE character tyes are 16 bit.  Some of these interfaces are
not available in all WIN32.DLL implementations.

> Java uses a weird scheme to negotiate the contents, where
> the server and the client both have to agree in the charset.
> Then you have to wrap strings in special functions. Then you
> have to specifically tell java that the input is "international" input.
> bla bla bla....Generally bad design and a big hassle.
> (Have you ever seen a Chinese/Japanese/Korean java-enabled website
>  that _works_? I have seen very very few.)

That's because it considers any I/O to be externalization; that's
a stupid assumption.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-i18n" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200103012114.OAA06019>