Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Feb 2011 16:20:24 -0500 (EST)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Lawrence Stewart <lstewart@freebsd.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Mounting NFSv4 as root fs
Message-ID:  <1194340518.139022.1298150424303.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <4D5F0825.9010607@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> On 02/19/11 08:38, Rick Macklem wrote:
> >> Hi Rick,
> >>
> >> I've set up a NFS server to pxeboot a set of testbed clients from.
> >> The
> >> server filesystem tree the client needs to use as its root has
> >> nullfs
> >> mounted directories in it. Therefore, NFSv4 is the only useful way
> >> to
> >> mount it on the client because of the cross mount point traversing
> >> capabilities built into v4. I've verified that I can "mount_nfs -o
> >> nfsv4
> >> ..." on the command line and see all the files in the tree so I
> >> have
> >> things working fine on the server side.
> >>
> >> I was aware our pxeboot only supports NFSv3, but hoped that by
> >> specifying "newfs" and "nfsv4" in the fstype and options fields
> >> respectively in fstab that things might just work when the mount
> >> root
> >> step after the kernel boot happens. It doesn't as I found out,
> >> because
> >> of two issues:
> >>
> >> 1. I believe there is a bug in the newnfs code. nfs_diskless.c
> >> wasn't
> >> copied from the old nfsclient and suitably modified for use with
> >> newnfs.
> >> As a result during boot, the ncl_mountroot() function in
> >> nfs_clvfsops.c
> >> calls nfs_setup_diskless() which calls into the old nfs code and
> >> badness
> >> happens from there on in. I have a patch which fixes this issue,
> >> though
> >> it may be completely the wrong way to do things as I'm very new (as
> >> in
> >> 24 hours new) to the NFS code.
> >>
> > Yep. I didn't see an easy way to set up the diskless root so that it
> > would
> > work for both clients concurrently, so I was planning on switching
> > it if/when
> > "newnfs" becomes the default client. (You can switch fairly easily.
> > Just
> > crib the code across, as it sounds like you have and then make sure
> > the
> > xxx_mountroot() in "newnfs" gets called instead of nfs_mountroot()
> > in the
> > other one.
> 
> Yes that's exactly what I did.
> 
> > However, that will just get a "newnfs" NFSv3 root mount to work.
> 
> Yup, confirmed working as expected (mount output shows "newnfs" for /
> whereas before it would fall back to "nfs" after the newnfs code
> crapped
> out.
> 
> >> 2. pxeboot stores the filehandle and filehandle length it used to
> >> grab
> >> the kernel via NFS in the kernel's env and after the kernel has
> >> booted,
> >> it looks for these variables and reuses them i.e. at no point in
> >> the
> >> process does the code attempt to upgrade to NFSv4 if the bootstrap
> >> uses
> >> NFSv3 to grab the kernel.
> >>
> >> For my particular use case, I'm quite happy for the kernel to be
> >> pulled
> >> via NFSv3, but can't boot the client without somehow getting the
> >> client
> >> to switch to NFSv4 at the point where it mount's root after the
> >> kernel
> >> has finished booting.
> >>
> >> I tried a very hacky test in mountnfs() in nfs_clvfsops.c to see if
> >> I
> >> could set the NFSV4 flag, unset the V3 flag and tell the code to
> >> forget
> >> about the cached file handle set by the loader just to see if the
> >> code
> >> would try to renegotiate using v4... it crashed and burned.
> >>
> > The same file handle should work for NFSv4 (at least a FReeBSD
> > server
> > generates the same FH for a v3 vs v4 mount).
> 
> Ah, interesting and good to know, thanks. So assuming the server is v4
> capable, you can just start issuing v4 RPCs to the handle established
> by
> pxeboot and things should keep working?
> 
Should might be too strong a word, but at least for the FreeBSD server,
yes. (I don't know about other servers. I just suspect that the FH's will
be the same. The change from NFSv2 -> NFSv3 was caused by the NFSv3 making
it a variable size. (Most servers fill the same FH in for NFSv2 and then
just pad it to 32bytes.)

> >> So, before I spend any more time on this, I hope to get your (or
> >> anyone
> >> else reading for that matter) thoughts on how best to proceed. Some
> >> questions:
> >>
> >> - Could you guesstimate how much work is involved to get v4 support
> >> into
> >> libstand so that pxeboot can talk v4 natively? I spent quite some
> >> time
> >> poking at libstand's code last night but don't understand the NFSv4
> >> RPC
> >> mechanism enough to attempt writing the basic code to do it yet.
> >> The
> >> RFC
> >> explains the ordering of OPs needed quit well but I don't quite
> >> grok
> >> how
> >> the data structures for interpreting responses work.
> >>
> > Lots. It will be easier to get the kernel to use v4 after pxeboot
> > has
> > loaded it via v3.
> 
> ACK.
> 
> >> - Can you think of a hacky simple way to force my client to
> >> renegotiate
> >> the mount as v4 at the time mount root happens?
> >>
> > If you are will to spend man weeks on this, you can probably get
> > something to work for your lab (useless for others, because you'll
> > have to hard wire a bunch of stuff into the kernel like your DNS
> > domain name...).
> >
> > I have never intended to try and make an NFSv4 root mount work.
> > (Someone said NFSv4 is NFS in name only:-)
> >
> > One of the most difficult parts will be the uid/gid<->name mapping.
> > You would have to hack this enough so that it worked without
> > nfsuserd.
> > Something like hard wiring mappings into the kernel cache for enough
> > entries that the root works. (Note that names look like
> > root@cis.uoguelph.ca,
> > so it needs to know the DNS domain as well as "root" == uid 0.)
> > Then hopefully you don't need other mappings to work, because it
> > would
> > have to work without nfsuserd running and with nfsuserd running (in
> > the
> > root fs).
> >
> > Short answer. A severely hacked kernel might work for your lab, but
> > a
> > generic solution for FreeBSD would be very difficult.
> 
> Thanks heaps for the brain dump, it really helps put things in
> perspective. It's sounding like a much bigger job than I thought it
> would be, even for a hacked up lab-only solution.
> 
> > If you could move the "nullfs" mounts down a level, so the NFSv4
> > mount
> > was below an NFSv3 root fs, that would be much easier.
> 
> Agreed. The issue is we're using the ezjail management script from
> ports
> to manage the bootable client filesystems on the server, and it uses
> nullfs mounts between a base filesystem and the client filesystems to
> avoid duplicating all the utilities/libs in /bin, /sbin, /lib and
> /libexec multiple times. Works well but not for this use case... oh
> well.
> 
> I guess it will be significantly easier to hack ezjail to just copy
> the
> dirs from the basejail into each client rather than try get the all
> singing all dancing NFSv4 option going.
> 
> Thanks again for your insights.
> 
> Cheers,
> Lawrence



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1194340518.139022.1298150424303.JavaMail.root>