Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 Jul 2006 15:20:40 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Robert Watson <rwatson@freebsd.org>
Cc:        freebsd-stable@freebsd.org, Michel Talon <talon@lpthe.jussieu.fr>
Subject:   Re: NFS Locking Issue
Message-ID:  <20060705122040.GN37822@deviant.kiev.zoral.com.ua>
In-Reply-To: <20060705113822.GM37822@deviant.kiev.zoral.com.ua>
References:  <E1FxzUU-000MMw-5m@cs1.cs.huji.ac.il> <20060705100403.Y80381@fledge.watson.org> <20060705113822.GM37822@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

--hnsKUeImFCk/igEn
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jul 05, 2006 at 02:38:22PM +0300, Kostik Belousov wrote:
> On Wed, Jul 05, 2006 at 10:09:24AM +0100, Robert Watson wrote:
> > The most significant problem working with rpc.lockd is creating easy to=
=20
> > reproduce test cases.  Not least because they can potentially involve=
=20
> > multiple clients.  If you can help to produce simple test cases to=20
> > reproduce the bugs you're seeing, that would be invaluable.
> >=20
> ........
> >=20
> > Reducing complex failure modes to easily reproduced test cases is trick=
y=20
> > also, though.  It requires careful analysis, often with ktrace and=20
> > tcpdump/ethereal to work out what's going on, and not a little luck to=
=20
> > perform the reduction of a large trace down to a simple test scenario. =
 The=20
> > first step is to try and figure out what, if any, specific workload res=
ults=20
> > in a problem.  For example, can you trigger it using work on just one=
=20
> > client against a server, without client<->client interactions?  This ma=
kes=20
> > tracking and reproduction a lot easier, as multi-client test cases are=
=20
> > really tricky!  Once you've established whether it can be reproduced wi=
th a=20
> > single client, you have to track down the behavior that triggers it --=
=20
> > normally, this is done by attempting to narrow down the specific progra=
m or=20
> > sequence of events that causes the bug to trigger, removing things one =
at a=20
> > time to see what causes the problem to disappear.  This is made more=20
> > difficult as lock managers are sensitive to timing, so removing a high =
load=20
> > item from the list, even if it isn't the source of the problem, might c=
ause=20
> > it to trigger less frequently.
>=20
> I made the patch for rpc.lockd that could somewhat ease obtaining
> debug information. Patch is available at
> http://people.freebsd.org/~kib/rpc.lockd-debug.patch
>=20
> No functional changes. Patch only adds dumping of currently held locks
> (as perceived by lockd) on receiving of SIGUSR1. You need to specify
> debug level 2 or 3 to obtain the dump.
>=20
> Also, the both lockd processes now put identification information
> in the proctitle (srv and kern). SIGUSR1 shall be sent to srv process.

Hmm, after looking at the dump there and some code reading, I have noted
the following:

1. NLM lock request contains the field caller_name. It is filled by
(let call it) kernel rpc.lockd by the results of hostname(3).

2. This caller_name is used by server rpc.lockd to send request
for host monitoring to rpc.statd (see send_granted).
Request is made by clnt_call, that is blocking rpc call.

3. rpc.statd does getaddrinfo on caller_name to determine address of the
host to monitor.

If the getaddrinfo in step 3 waits for resolver, then your client machine
will get locking process in"lockd" state.

Could people experiencing rpc.lockd mistery at least report whether
_server_ machine successfully resolve hostname of clients as reported
by hostname? And, if yes, to what family of IP protocols ?

--hnsKUeImFCk/igEn
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (FreeBSD)

iD8DBQFEq66XC3+MBN1Mb4gRAihxAJ0SnlK6dgxW2Avpgk0XQmnRbLJn2ACeKu4e
IBHKWUU0NroCooOkXQe5TNc=
=ixeW
-----END PGP SIGNATURE-----

--hnsKUeImFCk/igEn--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060705122040.GN37822>