Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 31 Aug 2007 17:31:29 +0200
From:      Tobias Ernst <tobi@casino.uni-stuttgart.de>
To:        freebsd-questions@freebsd.org
Subject:   strange arp problem with bge nics
Message-ID:  <46D83451.2030808@casino.uni-stuttgart.de>

next in thread | raw e-mail | index | archive | help
Dear all,

I've got two xSeries 346 servers here with a total of 6 Broadcom gigabit
NIC's each. I'm going to build a firewall with them, but right now I'm
in an early testing stage. The OS is FreeBSD 6.2-RELEASE for amd64.

Each of the machines is currently configured to have an IP from our
internal LAN on bge0. I use that link to ssh into the machines for
testing purposes. (This is a temporary solution, of course). Both
machines have their bge0 connected to our primary switch, where dozens
of other computers are connected as well. Networking works normally here.

Each machine also has got an IP address from a different network on the
respective bge5 interface. The bge5 interfaces are connected to a switch
having no other connections, i.e. this is a two machine network for
testing purposes.

My problem is I can ping machine #2 from machine #1 when using the IP
addresses configured on the bge1 NICs. I cannot ping the other machine
when using the IP addresses configured on the bge5 NICs as ARP entries
remain incomplete. I can then configure bge5 to promiscous mode on one
machine, and after about 10 seconds the ping starts working.


Here's what ipconfig and netstat -nr say right after booting:

Machine #1:

bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=1b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING>
        inet XX.XX.159.253 netmask 0xfffffe00 broadcast XX.XX.159.255
        ether 00:14:5e:ac:71:c9

bge5: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=1b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING>
        inet XX.XX.248.158 netmask 0xffffff00 broadcast XX.XX.248.255
        ether 00:10:18:11:72:40

Destination       Gateway            Flags    Refs      Use  Netif
default           141.58.159.254     UGS         0        0   bge0
127.0.0.1         127.0.0.1          UH          0        0    lo0
XX.XX.158/23      link#1             UC          0        0   bge0
XX.XX.158.1       00:17:f2:93:01:30  UHLW        1        3   bge0
XX.XX.159.254     00:04:76:19:03:de  UHLW        2        0   bge0
XX.XX.248/24      link#6             UC          0        0   bge5

Machine #2:

bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=1b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING>
        inet XX.XX.159.252 netmask 0xfffffe00 broadcast XX.XX.159.255
        ether 00:14:5e:b4:2e:82

bge5: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        options=1b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING>
        inet XX.XX.248.254 netmask 0xffffff00 broadcast XX.XX.248.255
        ether 00:10:18:11:6f:45

Destination       Gateway            Flags    Refs      Use  Netif
default           XX.XX.159.254      UGS         0        0   bge0
127.0.0.1         127.0.0.1          UH          0        0    lo0
XX.XX.158/23      link#1             UC          0        0   bge0
XX.XX.158.1       00:17:f2:93:01:30  UHLW        1       14   bge0
XX.XX.159.254     00:04:76:19:03:de  UHLW        2        0   bge0
XX.XX.248/24      link#6             UC          0        0   bge5

Now, if I ping XX.XX.248.254 from machine #1, I get "Sendto: Host is
down". The ARP table looks like this:

XXXXXXXXXXXXXXXXx.de (XX.XX.248.254) at (incomplete) on bge5 [ethernet]

This goes on indefinitely. I can then do "ifconfig bge5 promisc" on ANY
of the two machines (e.g. I can even do it on machine #2, or I can do it
on machine #1!) and about 10 seconds later, the ARP table on machine #1
gets completed and from then on, the network connection will work
normally, even if I do "ifconfig bge5 -promisc" after that. I can even
delete the arp table entries on both machines, but they will be
reinstated as soon as I issue the next ping. I need to reboot to trigger
the strange behaviour again.

I have already tried to use a different switch and have also tried using
a crosslink cable. Both show the same behaviour.

This is a vanilla install of 6.2-RELEASE. No firewalling of any sort is
enabled yet. The only thing I did is add "option BRIDGE" to the kernel
config on machine #1 and build a custom kernel (i.e. my kernel config on
machine #1 only differs from GENERIC in that one line. Machine #2 still
has the binary kernel from CD.)

Am I overlooking something or is this a bug? What should I do next? I am
not going to run the machines in the particular configuration described
above, but I am now worried that there might be a bug in the "bge"
driver and that I should not put these machines in production at all, at
least not with FreeBSD.

Regards
Tobias

-- 
Universität Stuttgart|Fakultät für Architektur und Stadtplanung|casinoIT
70174 Stuttgart Geschwister-Scholl-Straße 24D
T +49 (0)711 121-4228             F +49 (0)711 121-4276
E office@casino.uni-stuttgart.de  I http://www.casino.uni-stuttgart.de



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?46D83451.2030808>