From owner-freebsd-fs@FreeBSD.ORG Mon Apr 14 14:26:43 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 45C94FDB; Mon, 14 Apr 2014 14:26:43 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id CF8371D18; Mon, 14 Apr 2014 14:26:42 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqEEAI7vS1ODaFve/2dsb2JhbABZhBiDEMARgTh0giUBAQEDJARLBwUWDgoCAg0ZAl+IBwiocKJVF4EpjRE0gnaBSQSUdJYwg00hgW4 X-IronPort-AV: E=Sophos;i="4.97,857,1389762000"; d="scan'208";a="114292967" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 14 Apr 2014 10:26:41 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7A3C8B40A2; Mon, 14 Apr 2014 10:26:41 -0400 (EDT) Date: Mon, 14 Apr 2014 10:26:41 -0400 (EDT) From: Rick Macklem To: Bryan Drewery Message-ID: <7314631.10715155.1397485601490.JavaMail.root@uoguelph.ca> Subject: Re: getdirentries cookies usage outside of UFS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: FreeBSD Filesystems , Kirk McKusick , Konstantin Belousov X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Apr 2014 14:26:43 -0000 Bryan Drewery wrote: > Recently I was working on a compat syscall for sys_getdirentries() that > converts between our dirent and the FreeBSD dirent struct. We had never > tried using this on TMPFS and when we did ran into weird issues (hence > my recent commits to TMPFS to clarify some of the getdirentries() code). > We were not using cookies, so I referenced the Linux compat module > (linux_file.c getdents_common()) > > I ran across this code: > > /* > > * When using cookies, the vfs has the option of reading from > > * a different offset than that supplied (UFS truncates the > > * offset to a block boundary to make sure that it never reads > > * partway through a directory entry, even if the directory > > * has been compacted). > > */ > > while (len > 0 && ncookies > 0 && *cookiep <= off) { > > bdp =3D (struct dirent *) inp; > > len -=3D bdp->d_reclen; > > inp +=3D bdp->d_reclen; > > cookiep++; > > ncookies--; > > } > > > At first it looked innocuous but then it occurred to me it was the root > of the issue I was having as it was eating my cookies based on their > value, despite tmpfs cookies being random hash values that have no > sequential relation. So I looked at how NFS was handling the same code > and found this lovely hack from r216691: > > > not_zfs =3D strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs"); > >... > > while (cpos < cend && ncookies > 0 && > > (dp->d_fileno =3D=3D 0 || dp->d_type =3D=3D DT_WHT || > > (not_zfs !=3D 0 && ((u_quad_t)(*cookiep)) <=3D toff))) { > > cpos +=3D dp->d_reclen; > > dp =3D (struct dirent *)cpos; > > cookiep++; > > ncookies--; > > } > > I ended up doing the opposite, only running the code if getting dirents > from "ufs". > > So there's multiple issue here. > > 1. NFS is broken on TMPFS. I can see why it's gone so long unnoticed, > why would you do that? Still probably worth fixing. > Well, since exporting a volatile file system over NFS definitely breaks the protocol, the correct fix should probably be to disable TMPFS so that it cannot be exported at all. However, someone probably likes to do this anyhow, so I guess it should be left as is or fixed... > 2. Linux and SVR4 getdirentries() are both broken on TMPFS/ZFS. I am > surprised Linux+ZFS has not been noticed by now. I am aware the SVR4 is > full of other bugs too. I ran across many just reviewing the > getdirentries code alone. > > Do any other file systems besides UFS do this offset/cookie > truncation/rewind? If UFS is the only one it may be acceptable to change > this zfs check to !ufs and add it to the other modules. If we don't like > that, or there are potentially other file systems doing this too, how > about adding a flag to somewhere to indicate the file system has > monotonically increasing offsets and needs this rewind support. I'm not > sure where that is best done, struct vfsconf? > At a glance, I don't think UFS returns directory entries starting at a block boundary any more, either. (It reads from a block boundary, but skips the ones before the startoffset, if I read ufs_readdir() correctly.) As such, I'm not sure this is needed for UFS? However, it is going to be difficult to determine if this loop is no longer necessary for any file system. If if can't be determined that this is no longer necessary at all, a flag indicating a file system uses non-monotonically increasing directory offsets seems the cleanest fix to me. However, this change might be difficult to MFC. If is was a "non-monotonically increasing dir offset" flag that is set, old code would still function the same without knowledge of the flag, so long as the NFS server case was changed to "not_zfs and flag no set" and the "not_zfs" test was still done. (Of course, then the "not_zfs" case will probably be in the sources for decades to come, because no one can be sure it is safe to get rid of;-) It would be nice if it could be determined if any file system still backs up to a block boundary, but I can't answer that. I've cc'd Kirk and Kostik, in case either of them know or have opinions on this? rick ps: You might also take a look for uio_offset < 0 tests, since a -ve offset used to be meaningful. (Sorry, I can't remember when/how this was used, but if the high order bit of a directory offset cookie is set, there might be surprises;-) Maybe this isn't a problem for dirs, if I recall it correctly? > --=20 > Regards, > Bryan Drewery