Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 01 Apr 2011 10:50:18 -0400
From:      Steve Polyack <korvus@comcast.net>
To:        Frederique Rijsdijk <frederique@isafeelin.org>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Network stack unstable after arp flapping
Message-ID:  <4D95E62A.5000109@comcast.net>
In-Reply-To: <20110401141655.GA5350@deta.isafeelin.org>
References:  <20110401141655.GA5350@deta.isafeelin.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 04/01/11 10:16, Frederique Rijsdijk wrote:
> Hi,
>
> We (hosting provider) are in the process of implementing ipv6 in our network (yay). Yesterday one of the final steps in configuring and updating our core routers were taken, which did not go entirely as planned. As a result, the default gateway mac addresses for all our machines changed about 800 times in a time span of about 4 minutes.
>
> Here's a small piece of the logging:
>
> Mar 31 18:36:12 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d to 00:00:0c:07:ac:3d on bge0
> Mar 31 18:36:12 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:07:ac:3d to 00:00:0c:9f:f0:3d on bge0
> Mar 31 18:36:13 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d to 00:00:0c:07:ac:3d on bge0
> Mar 31 18:36:14 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:07:ac:3d to 00:00:0c:9f:f0:3d on bge0
> Mar 31 18:36:14 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d to 00:00:0c:07:ac:3d on bge0
> Mar 31 18:36:14 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:07:ac:3d to 00:00:0c:9f:f0:3d on bge0
> Mar 31 18:36:15 srv01 kernel: arp: x.x.x.1 moved from 00:00:0c:9f:f0:3d to 00:00:0c:07:ac:3d on bge0
>
> The x.x.x.1 is always the same IP, the gateway of the machine.
>
> The result of that, is that loads of FreeBSD machines (6.x, 7.x and 8.x) developed serious network issues, mainly being no or slow traffic between other (FreeBSD) machine accross different VLAN's in our own network.
>
> First thing that comes to mind is the network itself, but all Linux machines (Ubuntu, Red Hat and CentOS) had no issues at all. Only BSD.
>
> An arp -ad on both machines where problems occured, didn't solve anything. What worked better was /etc/rc.d/netif restart and a /etc/rc.d/routing restart. Some machines even had to be rebooted in order to get networking back to normal.
>
> This almost sounds like a bug in the network stack in BSD, but I can not imagine that I'm right. The BSD networking stack is considered to be one of the best..
>
> Any ideas anyone?
We experienced a similar issue here, but IIRC only on our 8.x systems 
(we don't have any 7.x).  Disabling flowtable cleared everything up 
immediately.  You can try that and see if it helps.  It seems like the 
flowtable  caches and associates the next-hop router MAC address with 
each flow, and unfortunately this doesn't get purged when the kernel 
senses and logs an ARP change.  The only other solution I've seen was to 
stop all network traffic on the machine until the flows/cache entries 
expired.

http://www.freebsd.org/cgi/query-pr.cgi?pr=155604 has more details of my 
run-in with this.  The title should be corrected though, as I found 
shortly after that all traffic is affected.

- Steve



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D95E62A.5000109>