Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 06 Feb 2013 11:15:54 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        FreeBSD Filesystems <freebsd-fs@FreeBSD.org>, Sergey Kandaurov <pluknet@FreeBSD.org>
Subject:   Re: ZFS lookup of ".." below .zfs returns itself (same vnode as dvp)
Message-ID:  <51121F4A.6050107@FreeBSD.org>
In-Reply-To: <1945675096.2731799.1360113998312.JavaMail.root@erie.cs.uoguelph.ca>
References:  <1945675096.2731799.1360113998312.JavaMail.root@erie.cs.uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help
on 06/02/2013 03:26 Rick Macklem said the following:
> Hi,
> 
> I've been working on a panic/crash that happens when a NFSv4
> mount from a client tries to lookup ".." below a .zfs directory.
> 
> The thread is over on freebsd-current:
> http://docs.FreeBSD.org/cgi/mid.cgi?CAE-mSOLA2J6KteFM-NH9Lb9TfX3rykckkMjguZMQFg4oLx-mWQ

Replicating here what I've just posted there, just in case.

> It seems that, for this case, the lookup of ".." returns itself.
> This causes a panic() when the code in zfs_lookup() tries to re-lock
> the directory, since it is already returned locked. A one line
> change at line #1451 in zfs_vnops.c to
>    if ((cnp->cn_flags & ISDOTDOT) && *vpp != dvp)
> stops the panics, but because I know nothing about ZFS, I don't
> know where to take this. Normally, I would only expect this at
> the root of a file system, but VV_ROOT isn't set for this vnode.
> 
> From reading a few comments, it seems that ZFS returns the snapshot
> directory for this case. I can vaguely see that .zfs is somehow "special".
> 
> Knowing nothing about ZFS, maybe someone can help with answers to
> a few questions and/or suggestions w.r.t. how the NFS server should
> handle this case.
> 
> Is .zfs considered a snapshot directory or is the snapshot directory
> below .zfs?
> 
> I see lookups for the name "snapshot". Is that the actual name of
> this snapshot directory and is it always the same?
> 
> Are these meant to look like normal mount points. If not, I can't
> see how things like getcwd() would work once cd'd to below it?
> 
> Any help with understanding this would be appreciated, rick
> ps: After the one line patch, the server doesn't panic, but it
>     seems to return an empty directory when the "ls /.zfs/shares/"
>     is done by Sergey.

Actually I think I have an explanation, just been busy past couple of days.
The problem is precisely with .zfs/shares, which is a strange beast that
currently has no practical use on FreeBSD.

.zfs/shares has its own on-disk node.  The node has some special properties:
- it is a directory node
- it is not reachable from any other node
- its parent ID is set to itself (as for a root node)
- its ID is stored in a special filesystem property

At run time ZFS creates special vnodes for .zfs, .zfs/snapshot and .zfs/shares.
The vnodes are special is a sense that each of them has its own v_ops different
from v_ops of the regular ZFS vnodes.
For example, vop_lookup method of .zfs/shares should return the .zfs vnode for a
".." lookup.  The v_ops are sane and self-consistent and everything is supposed
to work fine with them and provide some ".zfs magic".

Except for one hole.  The .zfs/shares vnode has the same inode number as the
on-disk node.  Also, its vop_vptofh generates fid_t consistent with the on-disk
node.
Then, ZFS vfs_fhtovp has a special case to do the right thing for fid_t-s of
.zfs and .zfs/snapshot.  But for some reason there is no special code for
.zfs/shares.  And so a regular ZFS vnode is created/returned in that case.

And this is the problem.

Regular zfs_lookup for ".." in this vnode returns the vnode itself because of
the magic properties described in the beginning.  And so on.

We seem to have inherited this problem from the upstream:
http://thread.gmane.org/gmane.os.illumos.zfs/599

I believe that currently NFS is the only user of VOP_FID and VFS_FHTOVP.
There are also getfh(2), lgetfh(2) and fhopen(2), but I am not sure how widely
they are used.

In either case, I believe that zfs_fhtovp should grow a check for object ==
zfsvfs->z_shares_dir and return the "made up" .zfs/shares vnode in that case
(instead of a regular zfs vnode).

Additionally, I am not sure, but perhaps zfs_vget() should do the same kind of
tricks as zfs_fhtovp.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51121F4A.6050107>