Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Jun 2014 15:32:55 -0500
From:      Bryan Drewery <bdrewery@FreeBSD.org>
To:        freebsd-fs@freebsd.org
Subject:   Re: getdirentries cookies usage outside of UFS [PATCH]
Message-ID:  <53AC8377.4060408@FreeBSD.org>
In-Reply-To: <201404142227.s3EMRwIL080960@chez.mckusick.com>
References:  <201404142227.s3EMRwIL080960@chez.mckusick.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------000609010903090106080801
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

On 2014-04-14 17:27, Kirk McKusick wrote:
>> Date: Fri, 11 Apr 2014 21:03:57 -0500
>> From: Bryan Drewery <bdrewery@freebsd.org>
>> To: freebsd-fs@freebsd.org
>> Subject: getdirentries cookies usage outside of UFS
>>
>> Recently I was working on a compat syscall for sys_getdirentries()
>> that
>> converts between our dirent and the FreeBSD dirent struct. We had
>> never
>> tried using this on TMPFS and when we did ran into weird issues (hence
>> my recent commits to TMPFS to clarify some of the getdirentries()
>> code).
>> We were not using cookies, so I referenced the Linux compat module
>> (linux_file.c getdents_common())
>>
>> I ran across this code:
>>
>>>     /*
>>>      * When using cookies, the vfs has the option of reading from
>>>      * a different offset than that supplied (UFS truncates the
>>>      * offset to a block boundary to make sure that it never reads
>>>      * partway through a directory entry, even if the directory
>>>      * has been compacted).
>>>      */
>>>     while (len > 0 && ncookies > 0 && *cookiep <= off) {
>>>             bdp = (struct dirent *) inp;
>>>             len -= bdp->d_reclen;
>>>             inp += bdp->d_reclen;
>>>             cookiep++;
>>>             ncookies--;
>>>     }=20
>>
>>
>> At first it looked innocuous but then it occurred to me it was the
>> root
>> of the issue I was having as it was eating my cookies based on their
>> value, despite tmpfs cookies being random hash values that have no
>> sequential relation. So I looked at how NFS was handling the same code
>> and found this lovely hack from r216691:
>>
>>> not_zfs = strcmp(vp->v_mount->mnt_vfc->vfc_name, "zfs");
>> =2E..
>>>  while (cpos < cend && ncookies > 0 &&
>>>      (dp->d_fileno == 0 || dp->d_type == DT_WHT ||
>>>       (not_zfs != 0 && ((u_quad_t)(*cookiep)) <= toff))) {
>>>          cpos += dp->d_reclen;
>>>          dp = (struct dirent *)cpos;
>>>          cookiep++;
>>>          ncookies--;
>>>  }
>>
>> I ended up doing the opposite, only running the code if getting
>> dirents
>> from "ufs".
>>
>> So there's multiple issue here.
>>
>> 1. NFS is broken on TMPFS. I can see why it's gone so long unnoticed,
>> why would you do that? Still probably worth fixing.
>>
>> 2. Linux and SVR4 getdirentries() are both broken on TMPFS/ZFS. I am
>> surprised Linux+ZFS has not been noticed by now. I am aware the SVR4
>> is
>> full of other bugs too. I ran across many just reviewing the
>> getdirentries code alone.
>>
>> Do any other file systems besides UFS do this offset/cookie
>> truncation/rewind? If UFS is the only one it may be acceptable to
>> change
>> this zfs check to !ufs and add it to the other modules. If we don't
>> like
>> that, or there are potentially other file systems doing this too, how
>> about adding a flag to somewhere to indicate the file system has
>> monotonically increasing offsets and needs this rewind support. I'm
>> not
>> sure where that is best done, struct vfsconf?
>>
>> Regards,
>> Bryan Drewery
>
> This code is specific to UFS. I concur with your fix of making
> it conditionl on UFS. I feel guilty for putting that code in
> unconditionally in the first place. In my defense it was 1982
> and UFS was the only filesystem :-)
>
> 	Kirk McKusick

Based on this discussion I have made the following patch against NFS. If 
we're comfortable with this approach I will apply the same logic to the 
Linux and SVR4 modules.

Mirrored at http://people.freebsd.org/~bdrewery/patches/nfs-zfs-ufs.patch

The patch changes 'not_zfs' to 'is_ufs' in the NFS code. Some of the 
code actually is ZFS-specific in regards to snapshot handling. So I have 
also added a 'is_zfs' variable to compare against for those cases.

I've removed the comments about ZFS in the UFS-only cases as the 
existing comment seems to cover it fine.

(Unrelated) This code, from r259845, in sys/fs/nfsserver/nfs_nfsdport.c 
seems odd to me:

         /*
          * Check to see if entries in this directory can be safely acquired
          * via VFS_VGET() or if a switch to VOP_LOOKUP() is required.
          * ZFS snapshot directories need VOP_LOOKUP(), so that any
          * automount of the snapshot directory that is required will
          * be done.
          * This needs to be done here for NFSv4, since NFSv4 never does
          * a VFS_VGET() for "." or "..".
          */
-       if (not_zfs == 0) {
+       if (is_zfs == 1) {
                 r = VFS_VGET(mp, at.na_fileid, LK_SHARED, &nvp);
                 if (r == EOPNOTSUPP) {
                         usevget = 0;
                         cn.cn_nameiop = LOOKUP;
                         cn.cn_lkflags = LK_SHARED | LK_RETRY;
                         cn.cn_cred = nd->nd_cred;
                         cn.cn_thread = p;
                 } else if (r == 0)
                         vput(nvp);
         }

This fallback is also done later (from r199715), but not limited to 
ZFS. Would it make sense to not limit this first check to ZFS as well? I 
see that unionfs_vget also returns EOPNOTSUPP. A nullfs mount from ZFS 
served over NFS may also return EOPNOTSUPP, as odd as that is.


-- 
Regards,
Bryan Drewery

--------------000609010903090106080801
Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0";
 name="nfs-zfs-ufs.patch"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="nfs-zfs-ufs.patch"

ZGlmZiAtLWdpdCBzeXMvZnMvbmZzc2VydmVyL25mc19uZnNkcG9ydC5jIHN5cy9mcy9uZnNz
ZXJ2ZXIvbmZzX25mc2Rwb3J0LmMKaW5kZXggMDk3OTg2NC4uYzc5OGU1ZCAxMDA2NDQKLS0t
IHN5cy9mcy9uZnNzZXJ2ZXIvbmZzX25mc2Rwb3J0LmMKKysrIHN5cy9mcy9uZnNzZXJ2ZXIv
bmZzX25mc2Rwb3J0LmMKQEAgLTE1NTEsNyArMTU1MSw3IEBAIG5mc3J2ZF9yZWFkZGlyKHN0
cnVjdCBuZnNydl9kZXNjcmlwdCAqbmQsIGludCBpc2RncmFtLAogCXVfbG9uZyAqY29va2ll
cyA9IE5VTEwsICpjb29raWVwOwogCXN0cnVjdCB1aW8gaW87CiAJc3RydWN0IGlvdmVjIGl2
OwotCWludCBub3RfemZzOworCWludCBpc191ZnM7CiAKIAlpZiAobmQtPm5kX3JlcHN0YXQp
IHsKIAkJbmZzcnZfcG9zdG9wYXR0cihuZCwgZ2V0cmV0LCAmYXQpOwpAQCAtMTYwNiw3ICsx
NjA2LDcgQEAgbmZzcnZkX3JlYWRkaXIoc3RydWN0IG5mc3J2X2Rlc2NyaXB0ICpuZCwgaW50
IGlzZGdyYW0sCiAJCQluZnNydl9wb3N0b3BhdHRyKG5kLCBnZXRyZXQsICZhdCk7CiAJCWdv
dG8gb3V0OwogCX0KLQlub3RfemZzID0gc3RyY21wKHZwLT52X21vdW50LT5tbnRfdmZjLT52
ZmNfbmFtZSwgInpmcyIpOworCWlzX3VmcyA9IHN0cmNtcCh2cC0+dl9tb3VudC0+bW50X3Zm
Yy0+dmZjX25hbWUsICJ1ZnMiKSA9PSAwOwogCU1BTExPQyhyYnVmLCBjYWRkcl90LCBzaXos
IE1fVEVNUCwgTV9XQUlUT0spOwogYWdhaW46CiAJZW9mZmxhZyA9IDA7CkBAIC0xNjg2LDEy
ICsxNjg2LDEwIEBAIGFnYWluOgogCSAqIHNraXAgb3ZlciB0aGUgcmVjb3JkcyB0aGF0IHBy
ZWNlZGUgdGhlIHJlcXVlc3RlZCBvZmZzZXQuIFRoaXMKIAkgKiByZXF1aXJlcyB0aGUgYXNz
dW1wdGlvbiB0aGF0IGZpbGUgb2Zmc2V0IGNvb2tpZXMgbW9ub3RvbmljYWxseQogCSAqIGlu
Y3JlYXNlLgotCSAqIFNpbmNlIHRoZSBvZmZzZXQgY29va2llcyBkb24ndCBtb25vdG9uaWNh
bGx5IGluY3JlYXNlIGZvciBaRlMsCi0JICogdGhpcyBpcyBub3QgZG9uZSB3aGVuIFpGUyBp
cyB0aGUgZmlsZSBzeXN0ZW0uCiAJICovCiAJd2hpbGUgKGNwb3MgPCBjZW5kICYmIG5jb29r
aWVzID4gMCAmJgogCSAgICAoZHAtPmRfZmlsZW5vID09IDAgfHwgZHAtPmRfdHlwZSA9PSBE
VF9XSFQgfHwKLQkgICAgIChub3RfemZzICE9IDAgJiYgKCh1X3F1YWRfdCkoKmNvb2tpZXAp
KSA8PSB0b2ZmKSkpIHsKKwkgICAgIChpc191ZnMgPT0gMSAmJiAoKHVfcXVhZF90KSgqY29v
a2llcCkpIDw9IHRvZmYpKSkgewogCQljcG9zICs9IGRwLT5kX3JlY2xlbjsKIAkJZHAgPSAo
c3RydWN0IGRpcmVudCAqKWNwb3M7CiAJCWNvb2tpZXArKzsKQEAgLTE4MDQsNyArMTgwMiw3
IEBAIG5mc3J2ZF9yZWFkZGlycGx1cyhzdHJ1Y3QgbmZzcnZfZGVzY3JpcHQgKm5kLCBpbnQg
aXNkZ3JhbSwKIAlzdHJ1Y3QgdWlvIGlvOwogCXN0cnVjdCBpb3ZlYyBpdjsKIAlzdHJ1Y3Qg
Y29tcG9uZW50bmFtZSBjbjsKLQlpbnQgYXRfcm9vdCwgbmVlZHNfdW5idXN5LCBub3RfemZz
LCBzdXBwb3J0c19uZnN2NGFjbHM7CisJaW50IGF0X3Jvb3QsIGlzX3VmcywgaXNfemZzLCBu
ZWVkc191bmJ1c3ksIHN1cHBvcnRzX25mc3Y0YWNsczsKIAlzdHJ1Y3QgbW91bnQgKm1wLCAq
bmV3X21wOwogCXVpbnQ2NF90IG1vdW50ZWRfb25fZmlsZW5vOwogCkBAIC0xODg0LDcgKzE4
ODIsOCBAQCBuZnNydmRfcmVhZGRpcnBsdXMoc3RydWN0IG5mc3J2X2Rlc2NyaXB0ICpuZCwg
aW50IGlzZGdyYW0sCiAJCQluZnNydl9wb3N0b3BhdHRyKG5kLCBnZXRyZXQsICZhdCk7CiAJ
CWdvdG8gb3V0OwogCX0KLQlub3RfemZzID0gc3RyY21wKHZwLT52X21vdW50LT5tbnRfdmZj
LT52ZmNfbmFtZSwgInpmcyIpOworCWlzX3VmcyA9IHN0cmNtcCh2cC0+dl9tb3VudC0+bW50
X3ZmYy0+dmZjX25hbWUsICJ1ZnMiKSA9PSAwOworCWlzX3pmcyA9IHN0cmNtcCh2cC0+dl9t
b3VudC0+bW50X3ZmYy0+dmZjX25hbWUsICJ6ZnMiKSA9PSAwOwogCiAJTUFMTE9DKHJidWYs
IGNhZGRyX3QsIHNpeiwgTV9URU1QLCBNX1dBSVRPSyk7CiBhZ2FpbjoKQEAgLTE5NTcsMTIg
KzE5NTYsMTAgQEAgYWdhaW46CiAJICogc2tpcCBvdmVyIHRoZSByZWNvcmRzIHRoYXQgcHJl
Y2VkZSB0aGUgcmVxdWVzdGVkIG9mZnNldC4gVGhpcwogCSAqIHJlcXVpcmVzIHRoZSBhc3N1
bXB0aW9uIHRoYXQgZmlsZSBvZmZzZXQgY29va2llcyBtb25vdG9uaWNhbGx5CiAJICogaW5j
cmVhc2UuCi0JICogU2luY2UgdGhlIG9mZnNldCBjb29raWVzIGRvbid0IG1vbm90b25pY2Fs
bHkgaW5jcmVhc2UgZm9yIFpGUywKLQkgKiB0aGlzIGlzIG5vdCBkb25lIHdoZW4gWkZTIGlz
IHRoZSBmaWxlIHN5c3RlbS4KIAkgKi8KIAl3aGlsZSAoY3BvcyA8IGNlbmQgJiYgbmNvb2tp
ZXMgPiAwICYmCiAJICAoZHAtPmRfZmlsZW5vID09IDAgfHwgZHAtPmRfdHlwZSA9PSBEVF9X
SFQgfHwKLQkgICAobm90X3pmcyAhPSAwICYmICgodV9xdWFkX3QpKCpjb29raWVwKSkgPD0g
dG9mZikgfHwKKwkgICAoaXNfdWZzID09IDEgJiYgKCh1X3F1YWRfdCkoKmNvb2tpZXApKSA8
PSB0b2ZmKSB8fAogCSAgICgobmQtPm5kX2ZsYWcgJiBORF9ORlNWNCkgJiYKIAkgICAgKChk
cC0+ZF9uYW1sZW4gPT0gMSAmJiBkcC0+ZF9uYW1lWzBdID09ICcuJykgfHwKIAkgICAgIChk
cC0+ZF9uYW1sZW49PTIgJiYgZHAtPmRfbmFtZVswXT09Jy4nICYmIGRwLT5kX25hbWVbMV09
PScuJykpKSkpIHsKQEAgLTIwMDQsNyArMjAwMSw3IEBAIGFnYWluOgogCSAqIFRoaXMgbmVl
ZHMgdG8gYmUgZG9uZSBoZXJlIGZvciBORlN2NCwgc2luY2UgTkZTdjQgbmV2ZXIgZG9lcwog
CSAqIGEgVkZTX1ZHRVQoKSBmb3IgIi4iIG9yICIuLiIuCiAJICovCi0JaWYgKG5vdF96ZnMg
PT0gMCkgeworCWlmIChpc196ZnMgPT0gMSkgewogCQlyID0gVkZTX1ZHRVQobXAsIGF0Lm5h
X2ZpbGVpZCwgTEtfU0hBUkVELCAmbnZwKTsKIAkJaWYgKHIgPT0gRU9QTk9UU1VQUCkgewog
CQkJdXNldmdldCA9IDA7CkBAIC0yMTUzLDcgKzIxNTAsNyBAQCBhZ2FpbjoKIAkJCQkJaWYg
KCFyKQogCQkJCQkgICAgciA9IG5mc3Zub19nZXRhdHRyKG52cCwgbnZhcCwKIAkJCQkJCW5k
LT5uZF9jcmVkLCBwLCAxKTsKLQkJCQkJaWYgKHIgPT0gMCAmJiBub3RfemZzID09IDAgJiYK
KwkJCQkJaWYgKHIgPT0gMCAmJiBpc196ZnMgPT0gMSAmJgogCQkJCQkgICAgbmZzcnZfZW5h
YmxlX2Nyb3NzbW50cHQgIT0gMCAmJgogCQkJCQkgICAgKG5kLT5uZF9mbGFnICYgTkRfTkZT
VjQpICE9IDAgJiYKIAkJCQkJICAgIG52cC0+dl90eXBlID09IFZESVIgJiYKZGlmZiAtLWdp
dCBzeXMvbmZzc2VydmVyL25mc19zZXJ2LmMgc3lzL25mc3NlcnZlci9uZnNfc2Vydi5jCmlu
ZGV4IGRlNTg0MjEuLjEwMTBkYTYgMTAwNjQ0Ci0tLSBzeXMvbmZzc2VydmVyL25mc19zZXJ2
LmMKKysrIHN5cy9uZnNzZXJ2ZXIvbmZzX3NlcnYuYwpAQCAtMjYyNyw3ICsyNjI3LDcgQEAg
bmZzcnZfcmVhZGRpcihzdHJ1Y3QgbmZzcnZfZGVzY3JpcHQgKm5mc2QsIHN0cnVjdCBuZnNz
dmNfc29jayAqc2xwLAogCWludCB2MyA9IChuZnNkLT5uZF9mbGFnICYgTkRfTkZTVjMpOwog
CXVfcXVhZF90IG9mZiwgdG9mZiwgdmVyZjsKIAl1X2xvbmcgKmNvb2tpZXMgPSBOVUxMLCAq
Y29va2llcDsgLyogbmVlZHMgdG8gYmUgaW50NjRfdCBvciBvZmZfdCAqLwotCWludCBub3Rf
emZzOworCWludCBpc191ZnM7CiAKIAluZnNkYnByaW50ZigoIiVzICVkXG4iLCBfX0ZJTEVf
XywgX19MSU5FX18pKTsKIAlmaHAgPSAmbmZoLmZoX2dlbmVyaWM7CkBAIC0yNjkwLDcgKzI2
OTAsNyBAQCBuZnNydl9yZWFkZGlyKHN0cnVjdCBuZnNydl9kZXNjcmlwdCAqbmZzZCwgc3Ry
dWN0IG5mc3N2Y19zb2NrICpzbHAsCiAJCWVycm9yID0gMDsKIAkJZ290byBuZnNtb3V0Owog
CX0KLQlub3RfemZzID0gc3RyY21wKHZwLT52X21vdW50LT5tbnRfdmZjLT52ZmNfbmFtZSwg
InpmcyIpICE9IDA7CisJaXNfdWZzID0gc3RyY21wKHZwLT52X21vdW50LT5tbnRfdmZjLT52
ZmNfbmFtZSwgInVmcyIpID09IDA7CiAJVk9QX1VOTE9DSyh2cCwgMCk7CiAKIAkvKgpAQCAt
Mjc3NywxMiArMjc3NywxMCBAQCBhZ2FpbjoKIAkgKiBza2lwIG92ZXIgdGhlIHJlY29yZHMg
dGhhdCBwcmVjZWRlIHRoZSByZXF1ZXN0ZWQgb2Zmc2V0LiBUaGlzCiAJICogcmVxdWlyZXMg
dGhlIGFzc3VtcHRpb24gdGhhdCBmaWxlIG9mZnNldCBjb29raWVzIG1vbm90b25pY2FsbHkK
IAkgKiBpbmNyZWFzZS4KLQkgKiBTaW5jZSB0aGUgb2Zmc2V0IGNvb2tpZXMgZG9uJ3QgbW9u
b3RvbmljYWxseSBpbmNyZWFzZSBmb3IgWkZTLAotCSAqIHRoaXMgaXMgbm90IGRvbmUgd2hl
biBaRlMgaXMgdGhlIGZpbGUgc3lzdGVtLgogCSAqLwogCXdoaWxlIChjcG9zIDwgY2VuZCAm
JiBuY29va2llcyA+IDAgJiYKIAkJKGRwLT5kX2ZpbGVubyA9PSAwIHx8IGRwLT5kX3R5cGUg
PT0gRFRfV0hUIHx8Ci0JCSAobm90X3pmcyAhPSAwICYmICgodV9xdWFkX3QpKCpjb29raWVw
KSkgPD0gdG9mZikpKSB7CisJCSAoaXNfdWZzID09IDEgJiYgKCh1X3F1YWRfdCkoKmNvb2tp
ZXApKSA8PSB0b2ZmKSkpIHsKIAkJY3BvcyArPSBkcC0+ZF9yZWNsZW47CiAJCWRwID0gKHN0
cnVjdCBkaXJlbnQgKiljcG9zOwogCQljb29raWVwKys7CkBAIC0yOTI4LDcgKzI5MjYsNyBA
QCBuZnNydl9yZWFkZGlycGx1cyhzdHJ1Y3QgbmZzcnZfZGVzY3JpcHQgKm5mc2QsIHN0cnVj
dCBuZnNzdmNfc29jayAqc2xwLAogCWludCB1c2V2Z2V0ID0gMTsKIAlzdHJ1Y3QgY29tcG9u
ZW50bmFtZSBjbjsKIAlzdHJ1Y3QgbW91bnQgKm1udHAgPSBOVUxMOwotCWludCBub3RfemZz
OworCWludCBpc191ZnM7CiAKIAluZnNkYnByaW50ZigoIiVzICVkXG4iLCBfX0ZJTEVfXywg
X19MSU5FX18pKTsKIAl2cF9sb2NrZWQgPSAwOwpAQCAtMjk4OCw3ICsyOTg2LDcgQEAgbmZz
cnZfcmVhZGRpcnBsdXMoc3RydWN0IG5mc3J2X2Rlc2NyaXB0ICpuZnNkLCBzdHJ1Y3QgbmZz
c3ZjX3NvY2sgKnNscCwKIAkJZXJyb3IgPSAwOwogCQlnb3RvIG5mc21vdXQ7CiAJfQotCW5v
dF96ZnMgPSBzdHJjbXAodnAtPnZfbW91bnQtPm1udF92ZmMtPnZmY19uYW1lLCAiemZzIikg
IT0gMDsKKwlpc191ZnMgPSBzdHJjbXAodnAtPnZfbW91bnQtPm1udF92ZmMtPnZmY19uYW1l
LCAidWZzIikgPT0gMDsKIAlWT1BfVU5MT0NLKHZwLCAwKTsKIAl2cF9sb2NrZWQgPSAwOwog
CXJidWYgPSBtYWxsb2Moc2l6LCBNX1RFTVAsIE1fV0FJVE9LKTsKQEAgLTMwNjgsMTIgKzMw
NjYsMTAgQEAgYWdhaW46CiAJICogc2tpcCBvdmVyIHRoZSByZWNvcmRzIHRoYXQgcHJlY2Vk
ZSB0aGUgcmVxdWVzdGVkIG9mZnNldC4gVGhpcwogCSAqIHJlcXVpcmVzIHRoZSBhc3N1bXB0
aW9uIHRoYXQgZmlsZSBvZmZzZXQgY29va2llcyBtb25vdG9uaWNhbGx5CiAJICogaW5jcmVh
c2UuCi0JICogU2luY2UgdGhlIG9mZnNldCBjb29raWVzIGRvbid0IG1vbm90b25pY2FsbHkg
aW5jcmVhc2UgZm9yIFpGUywKLQkgKiB0aGlzIGlzIG5vdCBkb25lIHdoZW4gWkZTIGlzIHRo
ZSBmaWxlIHN5c3RlbS4KIAkgKi8KIAl3aGlsZSAoY3BvcyA8IGNlbmQgJiYgbmNvb2tpZXMg
PiAwICYmCiAJCShkcC0+ZF9maWxlbm8gPT0gMCB8fCBkcC0+ZF90eXBlID09IERUX1dIVCB8
fAotCQkgKG5vdF96ZnMgIT0gMCAmJiAoKHVfcXVhZF90KSgqY29va2llcCkpIDw9IHRvZmYp
KSkgeworCQkgKGlzX3VmcyA9PSAxICYmICgodV9xdWFkX3QpKCpjb29raWVwKSkgPD0gdG9m
ZikpKSB7CiAJCWNwb3MgKz0gZHAtPmRfcmVjbGVuOwogCQlkcCA9IChzdHJ1Y3QgZGlyZW50
ICopY3BvczsKIAkJY29va2llcCsrOwo=
--------------000609010903090106080801--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53AC8377.4060408>