Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 06 Feb 2010 19:11:53 -0700 (MST)
From:      "M. Warner Losh" <imp@bsdimp.com>
To:        net@FreeBSD.org
Subject:   How does rpc.lockd know where to send a request
Message-ID:  <20100206.191153.401093655925072575.imp@bsdimp.com>

next in thread | raw e-mail | index | archive | help
I have a problem.  All systems are running freebsd-current form
sometime in the last month, although similar systems running
8.0-RELEASE exhibit exactly the same problem.  rpc.lockd on an NFS
client is doing something that baffles my mind entirely, maybe you can
help.  Please bear with me, this is a little complicated, but I wanted
to include all the details.

I have a host, let's call it dune.  dune is at 10.0.0.5.  dune is also
the master for the carp interface 10.0.0.99.  It is running rpc.lockd
and is an nfs server.  I've told nfs, rpcbind, lockd and statd to only
listen on address 10.0.0.99.

I have a second host.  maud-dib is 10.0.0.8.  I do "mount
10.0.0.99:/dune /dune" on maud-dib.  Wireshark shows all the traffic
going to 10.0.0.99.  All is happy in the world.  When I start, there's
no ARP entry for 10.0.0.5 on 10.0.0.8, nor is there after the mount.

Until I do the following 'lockf /dune/imp/junk ls' (I have write perms
to /dune/imp).  At this point, rpc.lockd hangs.  I get the message
"10.0.0.99:/dune: lockd not responding" which seems odd.  lockd is
really there.  However, wireshark shows the NLM traffic going to IP
address 10.0.0.5.  maud-dib has no carp interfaces.

That's odd.  So my question is 'how does lockd know where to go to
talk the NLM protocol?'

I did a packet capture from before I did the mount on maud-dib.  I can
see the NFS mount, the NFS traffic, all to 10.0.0.99.  I then see an
ARP for 10.0.0.5, followed by the NLM request from 10.0.0.8 to
10.0.0.5.  This gets an ICMP port unreachable message, since I told
nfs, et al, to bind only to 10.0.0.99.

So, I thought, 'the answer is obvious, I'll just look for the packet
that has the string 'dune' in it (which is the hostname of 10.0.0.5).
No packets have that string in it, other than the mount packet which
has /dune in it.  Nor is there any DNS activity doing a lookup.  Nor
is there any static mapping in /etc/hosts on 10.0.0.8.

Next thought: Oh, somebody like portmapper or the NFS protocol from
10.0.0.99 is telling 10.0.0.8's rpc.lockd (or something else) to do
locking requests to 10.0.0.5.  That's trivial to find, I think to
myself.  I'll look for the octets 0a 00 00 05 (hex).  The only
instances of that are in the ARP packet, the NLM request and the ICMP
unreachable packets.  No other packets includes these bytes.  Nor do
any include the reverse.

Right after the mount, there's nothing in the connection table that
points to 10.0.0.5, only 10.0.0.99.

So I'm having a serious WTF moment.  How the heck is this even
possible.  Any ideas on where to look for where this gets set and/or
communicated?

thanks a bunch for any insight that you can give...

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100206.191153.401093655925072575.imp>