Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Dec 2014 08:42:22 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Kirk McKusick <mckusick@mckusick.com>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>, Gleb Kurtsou <gleb@freebsd.org>, Konstantin Belousov <kib@freebsd.org>
Subject:   Re: patch that makes d_fileno 64bits
Message-ID:  <293906398.3337213.1419860542696.JavaMail.root@uoguelph.ca>
In-Reply-To: <201412290702.sBT72I8G087361@chez.mckusick.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Kirk wrote:
> > Date: Sat, 27 Dec 2014 18:32:48 -0500 (EST)
> > From: Rick Macklem <rmacklem@uoguelph.ca>
> > To: FreeBSD Filesystems <freebsd-fs@freebsd.org>,
> >         Kirk McKusick <mckusick@mckusick.com>, Gleb Kurtsou
> >         <gleb@freebsd.org>,
> >         Konstantin Belousov <kib@freebsd.org>
> > Subject: patch that makes d_fileno 64bits
> > 
> > Hi,
> > 
> > Kirk and Gleb Kurtsou (plus some others) are working through
> > the difficult job of changing ino_t to 64bits. (Changes to
> > syscalls,
> > libraries, etc.)
> > 
> > This patch:
> > http://people.freebsd.org/~rmacklem/64bitfileno.patch
> > 
> > is somewhat tangential to the above, in that it changes the
> > d_fileno field of "struct dirent" and va_fileid to uint64_t.
> > It also includes adding a field called d_cookie to "struct dirent",
> > which is the position of the next directory entry in the underlying
> > file system. A majority of this patch are changes to the NFS code,
> > but it includes a simple "new struct dirent"->"old struct dirent32"
> > copy routine for getdirentries(2) and small changes to all the
> > file systems so they fill in the "new struct dirent".
> > 
> > This patch can be applied to head/current and the resultant kernel
> > should work fine, although I've only been able to test some of the
> > file systems. However, DO NOT propagate the changes to
> > sys/sys/dirent.h
> > out to userland (/usr/include/sys/dirent.h) and build a userland
> > from
> > it or things will get badly broken.
> > 
> > I don't know if Kirk and/or Gleb will find some of this useful for
> > their updates to project/ino64, but it will allow people to test
> > these changes. (It modifies the NFS server so that it no longer
> > uses
> > the "cookie" args to VOP_READDIR(), but that part can easily
> > be removed from the patch.)
> > 
> > If folks can test this patch, I think it would be helpful for
> > the effort of changing ino_t to 64bits.
> > 
> > Have fun with it, rick
> 
> Thanks Rick, this does look useful. Since Gleb is leading the charge
> with changing ino_t to 64 bits, I will let him have final say on when
> it would be helpful to have this go into HEAD. But it does seem that
> it should be possible to do it before the other changes and
> independently
> of them since it only chnges the internal kernel interfaces. But
> perhaps
> I am missing something that Gleb or kib can point out.
> 
Well, I don't think it can go into head before the rest as it stands,
because it changes "struct dirent", which then propagates out to
/usr/include/sys and breaks userland.

However, if the patch was changed so that the old one remained "struct dirent"
and the new one called something else like "struct dirent64", then
I think it could go into HEAD. However, the patch would then include
a whole bunch of "struct dirent"-->"struct dirent64" changes in the
kernel. (Not difficult to do. That was the way I started out doing it
and then switched simply to keep the diff smaller.)
Btw, ZFS has exactly that already. They call it "struct dirent64" and
then there is are lines in their dirent.h that:
typedef   struct dirent  dirent64_t
#define   dirent64      dirent
so for ZFS it is just changing these lines.
Oh, and UFS mostly uses "struct direct" except for one little chunk
of code, so it's easy, too.

Doing this might be worth considering, since it would help flush out
any bugs in this area before the big ino_t patch goes in?
(I can easily grind out a version of the patch using "struct dirent64",
 if that would be useful?)

I'll admit I never got to look at Gleb changes that weren't yet
in projects/ino64 (I couldn't get git to work and the URL to github
wouldn't work for me either for some reason;-), so I suspect he'll
want to merge with what he has. (C code is simple, but these new fangled
things like git...scarey for an old guy like me;-)

> It seems to me that the cookies calculation could be taken out of the
> VOP_GETDIRENTRIES interface since NFS is the only client of it.
> 
I think this would be nice, but if you guys want to delay that, the
part of the patch that makes nfsd use "d_cookie" (or whatever you
choose to call it) instead of the cookie args can just be reverted.
(Most of the file systems code will look cleaner and simpler without
 the cookies stuff. In particular, unionfs gets kinda weird with the
 cookie stuff.)

> In looking through your patch, I did not see anything that looked
> wrong.
Yea, it was pretty straightforward and I did spot a couple of weird
bits that Gleb might want to know about.
fuse - Has a bogus line of code (I think was meant to do null termination
       of the name) that actually writes 1 byte past the "struct dirent".
       Fortunately, the fuse code that calculated the buffer size is
       bogus too and makes the buffer larger than necessary so the assignment
       is harmless.
       - I just took out the bogus assignment (since the buffer is already
         bzero()'d) and fixed the buffer sizing.
fdesc - There is a constant UIO_MX, which is the size used for all
       "struct dirent"s that had to be increased.

For va_fileid, there were a few file systems that assigned it a value
from a variable declared "int" or "long". I decided to put a "(u_int)"
or "(u_long)" cast in front of them, just to make sure they didn't
sign extend for 32bit arches if the high order bit was somehow set.
(I didn't do this for fdesc, because I was pretty sure it would never
 have more than 2**31 nodes, but I think it is harmless to do so.)
Alternately, you could put a KASSERT() to make sure the value isn't
negative before the assignment.

Good to see progress on the 64bit ino_t front, rick

> But as you point out, more testing is needed :-)
> 
> 	Kirk McKusick
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?293906398.3337213.1419860542696.JavaMail.root>