Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Feb 2021 14:32:03 -0700
From:      Alan Somers <asomers@freebsd.org>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: NFS delegations don't expire after unmounting client
Message-ID:  <CAOtMX2jQCRsUPaGw2uVb8XuguNnFzmdUt1OpPM8C7riE5Q%2BbfQ@mail.gmail.com>
In-Reply-To: <YQXPR0101MB0968EC580D4F4006E155AC9DDD8C9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>
References:  <CAOtMX2h_2zCNpyzOs=SzuohRvLgga=Eip-LJ-7QjJBvwmueLXg@mail.gmail.com> <YQXPR0101MB0968EC580D4F4006E155AC9DDD8C9@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Feb 11, 2021 at 2:07 PM Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Alan Somers wrote:
> >I have several Linux 5.9.15 clients mounting NFS 4.1 served from a FreeBSD
> >12.2-RELEASE server.  Today, most of those clients' mounts hung, and their
> >dmesg displayed "nfs: server XXX not responding, still trying".  But one
> >client kept running fine.  nfsdumpstate on the server showed that that
> >client, and that one only, had 2 delegations.  It also had 1 OpenOwner, 1
> >Open, and the CB flags set.  It was the only client that had CB set.  On
> >the theory that its delegation callbacks weren't working, I tried
> >unmounting all of its NFS shares.  That worked, but to my surprise
> >nfsdumpstate showed no change!  I could see that the lease time recorded
> in
> >/var/run/nfs-stablerestart was 120s, and I must've waited about 30m in all
> >before disabling delegations, unmounting everything, and returning to NFS
> >v3.  So my questions are, what can cause a delegation to linger around
> long
> >after it should've expired, and what else can I do to debug this problem
> if
> >it recurs?
> The FreeBSD NFSv4 server implements "courtesy locks" (my idea, but someone
> else coined the term for it), where a lock is not thrown away until both
> the
> lease has expired and a conflicting lock request is received from another
> client.
> --> In this case, that would be an Open of the file from another client.
> The idea is to avoid loss of lock state when there is a networking
> partitioning
> that exceeds the lease duration.
>

Ahh, so maybe the stale delegation was a red herring!  That would make
sense.  Especially because the client with the stale delegation was
mounting a different share than at least one of the hung clients.


>
> When a client dismounts, it should tell the server it is done with the
> open/lock
> state by doing a DestroyClientID operation.
> (SetClientID/SetClientIDConfirm for 4.0)
> --> If the Linux client did this, then it sounds like something is broken
> in the server,
>       but my hunch is that the Linux client did not do this.
> If you can capture packets during a dismount, you should be able to look
> at them in wireshark and see if the DestroyClientID happened.
>
> There is also the nfsrevoke command, which is supposed to be able to
> get rid of client lock state, but I'll admit I haven't tested it in like a
> decade;-)
>

Well, it looks like it works.  When I tried it, the delegation disappeared
from nfsdumpstate's output.  That did not resolve the hang, however.  So
the delegation was probably red herring then.

I guess I'll have to roll up my sleeves and start tcpdumping then.  Sigh.

Thanks for the tips.
-Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jQCRsUPaGw2uVb8XuguNnFzmdUt1OpPM8C7riE5Q%2BbfQ>