Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Jan 2013 18:32:05 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Sergey Kandaurov <pluknet@gmail.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: getcwd lies on/under nfs4-mounted zfs dataset
Message-ID:  <2118820107.2024400.1358292725230.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <CAE-mSOJk1HbxvF=ZpoSP21b9j65qMov=AE-OM6wcUkbadQeZbw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
pluknet@gmail.com wrote:
> Hi.
> 
> We stuck with the problem getting wrong current directory path
> when sitting on/under zfs dataset filesystem mounted over NFSv4.
> Both nfs server and client are 10.0-CURRENT from December or so.
> 
> The component path "user3" unexpectedly appears to be "." (dot).
> nfs-client:/home/user3 # pwd
> /home/.
> nfs-client:/home/user3/var/run # pwd
> /home/./var/run
> 
> nfs-client:~ # procstat -f 3225
> PID COMM FD T V FLAGS REF OFFSET PRO NAME
> 3225 a.out text v r r-------- - - - /home/./var/a.out
> 3225 a.out ctty v c rw------- - - - /dev/pts/2
> 3225 a.out cwd v d r-------- - - - /home/./var
> 3225 a.out root v d r-------- - - - /
> 
> The used setup follows.
> 
> 1. NFS Server with local ZFS:
> # cat /etc/exports
> V4: / -sec=sys
> 
> # zfs list
> pool1 10.4M 122G 580K /pool1
> pool1/user3 on /pool1/user3 (zfs, NFS exported, local, nfsv4acls)
> 
> Exports list on localhost:
> /pool1/user3 109.70.28.0
> /pool1 109.70.28.0
> 
> # zfs get sharenfs pool1/user3
> NAME PROPERTY VALUE SOURCE
> pool1/user3 sharenfs -alldirs -maproot=root -network=109.70.28.0/24
> local
> 
> 2. pool1 is mounted on NFSv4 client:
> nfs-server:/pool1 on /home (nfs, noatime, nfsv4acls)
> 
> So that on NFS client the "pool1/user3" dataset comes at /home/user3.
> / - ufs
> /home - zpool-over-nfsv4
> /home/user3 - zfs dataset "pool1/user3"
> 
> At the same time it works as expected when we're not on zfs dataset,
> but directly on its parent zfs pool (also over NFSv4), e.g.
> nfs-client:/home/non_dataset_dir # pwd
> /home/non_dataset_dir
> 
> The ls command works as expected:
> nfs-client:/# ls -dl /home/user3/var/
> drwxrwxrwt+ 6 root wheel 6 Jan 10 16:19 /home/user3/var/
> 
Well, if you are just looking for a work around, you could try mounting
/home/user3 separately.

Otherwise, here's roughly what needs to happen for it to work. (There
may be some additional trick(s) I am not aware of.)

On the server, ZFS must report:
- different fsids for /home vs /home/user3
- fileno (A) must be the same value for "." and ".." for the zfs dataset root
  (and set VV_ROOT on the vnode)
- fileno (B) for "user3" reported by readdir() on /home must be different
  than what "." and ".." report.

Then the NFS server will report a different value (B) for Mounted_on_fileno
than it does for Fileno (A), when the client gets attributes for the directory
/home/user3.

When the client sees Mounted_on_fileno != Fileno, it knows it is at a server
mount point boundary and should report the correct stuff to stat() and readdir().

I haven't tested this for a while, so it might be broken for UFS as well. If that's
the case, I can probably try and track down the problem here.
If not, you can capture packets when you do the getcwd() and then look at them
in wireshark, so you can see what the server is returning for Fileno and 
Mounted_on_fileno. They should be different for "/home/user3" and the latter
one should be the value returned by Readdir of "/home" for the "user3" entry.

I won't be in a position to look at a wireshark trace until April, so I can't
help with that at this time.

Since I've never used ZFS, I have no idea what it considers a "mount point"?
(Generically, within a mount point there needs to be "same fsid" and a unique
 set of "fileno" values for all objects. When crossing the mount point, VV_ROOT
 needs to be set and the mounted_on_vp (or whatever it's called) must refer
 to the parent. "/home" for this case.)

I don't know if this helps, rick
ps: Solaris10 clients don't get this to work, so you always need to mount
    each server file system separately, which is the "work around" I suggested
    at the beginning of this post.

> --
> wbr,
> pluknet
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2118820107.2024400.1358292725230.JavaMail.root>