Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Mar 2002 23:19:53 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Garance A Drosihn <drosih@rpi.edu>
Cc:        Robert Watson <rwatson@FreeBSD.ORG>, Harti Brandt <brandt@fokus.gmd.de>, Poul-Henning Kamp <phk@critter.freebsd.dk>, arch@FreeBSD.ORG
Subject:   Re: Increasing the size of dev_t and ino_t
Message-ID:  <3C8DAC19.B1ED58B@mindspring.com>
References:  <Pine.NEB.3.96L.1020311160835.46602A-100000@fledge.watson.org> <p0510154eb8b2ed99898c@[128.113.24.47]>

next in thread | previous in thread | raw e-mail | index | archive | help
Garance A Drosihn wrote:
> If UFS2 requires a 64-bit (u)ino_t, then we're going to have to
> make some kind of change to the struct returned by stat().  We
> have also talked about wanting 64-bit fields for time values in
> that same struct.  The more I think about it, the more I think
> we should just move towards a 64-bit field for (u)dev_t at the
> same time.  Maybe we should wrap these all up into one major
> change, so we can have a st_dev+st_ino which can handle all
> existing filesystems (with some room for expansion).

It's more complicated than just struct stat, I think.

The "struct dirent" that's returned by getdirentries(2) contains
a 32 bit file ID on the FS (d_fileno).

This is actually equal to the value of st_ino from the stat,
and the man page makes it clear that this is the case, at least
as far as the st_ino semantics that Garrett Wollman quoted out
of the POSIX spec. (more recent than my 1988 copy).  The man
page says:

     The d_fileno entry is a number which is unique for each
     distinct file in the filesystem.  Files that are linked
     by hard links (see link(2)) have the same d_fileno.

The struct dirent is actually a system version of the directory
information, intended to be FS independent.  It's externalized
from the on disk directory structure.  It's "coincidental" that
it matches the FFS on disk structure.

One of the issues here is that it *does not* match the
externalized value that's sent over NFS; among other things,
this is the reason for the "cookie" argument to the VOP_READDIR
per VFS interface.

A side issue (not worth discussing at this point, but worth
keeping in mind) is that there is also a fundamental assumption
in this interface that all directory entries within a directory
are on the same volume.  THis actually is not true for the
entries which are directories which have been used as mount
points, and may also not be true for a translucent FS with
e.g. a CDROM and a seperate FFS image unioned to make the CDROM
image writeable.  It may also not be true on a per file basis,
if the moral equivalent of symlinks are implemented in the
lookup space, rather than in the FS namespace (e.g. folding of
the namespace for various purposes).  So it's probably a good
idea to rethink this interface in any case, to externalize the
per-file st_dev information, as well (if the interface is going
to be changing anyway, it might as well be more correctly "a
collection of stat information").

And that's one example of an exposure other than the "stat"
interface.  THe POSIX file locking semantics are another,
though the translation (if any) would be internalized in a
layer in the kernel.


So you're not just talking a change to the "stat" structure,
you are talking, minimally, either a conversion function, or
a change to the system representation (to maintain the
historical "coincidental" match between the on disk structure
for UFS2 and the system structure).

This has translational implications, both for the NFS mapping
space and for the ABI modules (e.g. the Linux ABI).

I'll suggest (again) that what wants to happen here is that
the VOP_READDIR needs to be broken into two operations: one
to get a block reference, atomically, and another, to take
a block reference in native format, and convert entries on a
case-by-case basis to an externalized format.

I don't know the UFS2 intent on namespace, but it's likely
that if it's well thought out, it will be two byte Unicode,
so that fixed field length guarantees are maintained (UTF-7
or UTF-8 encoded data with escapes for the path component
seperator "/" and the ASCII NUL character are unacceptable,
both from the need to escape the character when it occurs in
a valid multibyte character, and from the inability to make
path component length guarantees, per POSIX).

So to sum up:

1)	It's not just struct stat

2)	There are real client FS's in common use that will
	be impacted by such a change

3)	There are FS consumers that aren't FS's that will
	also be impacted by such a change, including the
	ABI code

I expect that we will see changes in these areas anyway, but
it's a good idea to keep in mind that these changes are not
as trivial as they might appear on casual inspection.  It
would be nice if the brekage were a single event that were
not often repeated (everything at once), and if there were
some backward compatability strategy well thought out before
it became a dire need.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C8DAC19.B1ED58B>