Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 May 2007 17:44:43 -0400
From:      Vinny Abello <vinny@tellurian.com>
To:        Vinny Abello <vinny@tellurian.com>, security <security@jim-liesl.org>,  freebsd-stable@freebsd.org
Subject:   Re: Packet Loss w/bge & BCM5703 on Dell PE2650
Message-ID:  <465DF04B.3090009@tellurian.com>
In-Reply-To: <20070530190340.GA28312@eos.sc1.parodius.com>
References:  <200705300934.l4U9Y7eJ022617@lurza.secnetix.de> <465DB5B2.8040707@tellurian.com> <20070530174631.GA15795@eos.sc1.parodius.com> <499c70c0705301113n58588719j6fb35154701f3cb1@mail.gmail.com> <465DC050.8020707@tellurian.com> <465DC2BA.6040709@jim-liesl.org> <465DC746.2090401@tellurian.com> <20070530190340.GA28312@eos.sc1.parodius.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Jeremy Chadwick wrote:
> On Wed, May 30, 2007 at 02:49:42PM -0400, Vinny Abello wrote:
>> All suggestions welcomed:
> 
> Your mbuf counts look OK.  I don't see anything there which looks
> like a problem.  If you had packet loss caused by mbuf exhaustion,
> your FreeBSD console log would show something.
> 
> I've some couple questions:
> 
> 1) Checked console logs (dmesg -a) to see if there's anything there
>    which might give you hints to the problem?

I'm on the console. Nothing comes up when there is packet loss. Nothing
in the dmesg -a output either likewise... Interestingly, I sometimes
forget to adjust the icmplim which will artificially limit the pings in
my test. On 4.11, this would show the kernel is doing rate limiting. On
6.2 it does not give any indication. I do confirm that it is not being
rate limited though by setting net.inet.icmp.icmplim to some high value
or -1 also seems to work.

> 2) Any IPMI modules installed in that Dell box?

As far as I recall, IPMI support was not added to the PowerEdge servers
until the 2850 which supported IPMI 1.5. This is a 2650. There are no
addin IPMI cards. There is the built in DRAC controller but that does
not share the NIC like happens on the 2850 and 2950 servers. It has it's
own dedicated port. Interestingly it uses 3 IRQ's! I tried even
switching IRQ's in the BIOS around to make sure the NIC had it's own
unshared IRQ but that made no difference either.

> 3) vmstat -i output?

test# vmstat -i
interrupt                          total       rate
irq1: atkbd0                        4557          0
irq6: fdc0                            10          0
irq14: ata0                           47          0
irq28: bge0                          304          0
irq30: aac0                         2784          0
cpu0: timer                     24371045       1999
Total                           24378747       2000

> 4) Is there a switch between the Cisco router and the FreeBSD box?

Yes, both my production setup and my test setup where I reproduced the
problem.

> 5) If there is a switch between the router and the FreeBSD box, have
>    you tried the pings from a box (not the Cisco) on the same switch
>    segment as the FreeBSD box?

Yes, same loss. I tried from several devices on the segment and see the
loss, only to the FreeBSD systems running on that specific hardware.

> 6) Have you tried pings the other way (FreeBSD box -> box#2, and
>    box#2 -> FreeBSD box) to see if its reproducable that way?

Yes, in fact that is the reason this is such a big problem. I use the
production system to measure latency and packet loss, and since going to
FreeBSD 6.x, it has been showing random packet loss in my data that
isn't there at all.

> 7) Does it only happen with ICMP traffic, or can you reproduce the loss
>    using something like FTP (slow transfer rates/stalls)?

It's hard to say. It's definitely with ICMP. I've not noticed it with
other data. I'll have to do more testing.

> 8) Tried downthrottling to 100mbit (ifconfig_bge0="... media 100baseTX")
>    on both sides, to see if it's a gigabit-specific problem?

My test system is running at 100Mb, same hardware, same problem.

> 9) Tried different cabling?  I see the network is gigabit.  You might
>    try replacing the cables, preferably with CAT6.

Tried changing cables, ports, port speed, switches, tried the second NIC
in the system (same model as the first). The only thing I haven't done
is put a third party NIC like an Intel Pro/100 in the system and try it.
I'm sure it will work ok but I don't know for a fact. I can try that as
a test. I have boxes of them sitting next to my desk here.


Thanks for your time! :)

-- 

Vinny Abello
Network Engineer
vinny@tellurian.com
(973)940-6100
PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0  E935 5325 FBCB 0100 977A

Tellurian Networks - The Ultimate Internet Connection
http://www.tellurian.com (888)TELLURIAN

"Courage is resistance to fear, mastery of fear - not absence of fear"
-- Mark Twain



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?465DF04B.3090009>