Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Dec 1995 18:16:16 -0700
From:      Nate Williams <nate@rocky.sri.MT.net>
To:        hackers@FreeBSD.org, isp@FreeBSD.org
Subject:   BSD networking code guru needed?
Message-ID:  <199512210116.SAA01444@rocky.sri.MT.net>

next in thread | raw e-mail | index | archive | help
I'm seeing a fairly significant bug in FreeBSD's networking code
w/regards to routing and arp, and I'm looking for someone who can help
me figure this out.

I've asked Garrett to help, but he has been too busy so I'm now looking
for someone else who is familiar with the networking code in FreeBSD.

I can re-produce the problem at will and can tell whomever helps me how
to re-create it.  I'm suprised that none of the ISP's have seen this,
but I suspect they aren't using proxy-arp, or aren't seeing folks
re-connect from broken PPP connections as fast as we do.

Basically, I'm using proxy-arp to setup routing from a couple 'portable'
computers which can exist on either our local network, or at home.  I'm
using the same IP address for both locations, and I'm using proxy-arp
to allow the machines to sit behind our PPP server (a FreeBSD box).

When the line goes down and the remote machines are talking with a
machine on our local network, the PPP process (correctly) removes the
proxy-arp entry from the PPP server.  However, packets are still being
sent to that box from other machines on our network, which causes the
server box to send out an arp request onto the ethernet and add an
incomplete arp entry and route to the ethernet.  This is acceptable
*except* that the remote box re-connects to the server, which causes PPP
to proxy-arp again for the remote box.

What *should* happen is the incomplete arp entry and route should be
removed from the tables and replaced with the now valid proxy-arp entry.
What is happening is the proxy-arp entry is added to the table *after*
the incomplete arp entry, so the server machine doesn't know to route
traffic to the remote machine via the PPP link.  PPP is doing the
correct thing, and the remote machine is sending data to the server, but
the server doesn't know the correct route to get back to it since it
assumes the route is via the ethernet.

This problem occurs with normal arp as well as proxy-arp, so you can have
up to three arp entries for a single IP address in the arp table.

Here's what happens on my server box right now.

ws1.sri.MT.net (204.182.243.100) at (incomplete)
ws1.sri.MT.net (204.182.243.100) at 0:80:48:e8:27:63 permanent published
ws1.sri.MT.net (204.182.243.100) at 0:80:48:e8:27:63 permanent published (proxy only)

Fun, huh?  I've got kernel dumps where the bogosity is occuring,
back-traces, and all sorts of programs to trigger the bug and more
information than you'll ever want to describe the problem, but I'm
beating my head against the wall trying to figure out the code flow, so
I'm appealing the BSD gurus to help.

This problem won't occur if the arp entries time-out on both the remote
host and the server box.  If that happens, then the proxy-arp entry
which gets added by PPPD is the first in the arp table, and routing is
correct until the line goes down.

I've checked and neither SunOS 4.1 nor Solaris 2.4 have this bug, and I
don't have root access on any other OS's to test this out.

Please help!



Nate



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199512210116.SAA01444>