Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Nov 2009 09:26:00 +0100
From:      =?iso-8859-1?Q?Eirik_=D8verby?= <ltning@anduin.net>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        pyunyh@gmail.com, weldon@excelsusphoto.com, freebsd-current@freebsd.org, Robert Watson <rwatson@freebsd.org>, Gavin Atkinson <gavin@freebsd.org>
Subject:   Re: FreeBSD 8.0 - network stack crashes?
Message-ID:  <C3CC7F37-10BE-41DD-96E4-C952C6434ACC@anduin.net>
In-Reply-To: <d763ac660911292347i74caba25h9861a4d9feb63d77@mail.gmail.com>
References:  <A1648B95-F36D-459D-BBC4-FFCA63FC1E4C@anduin.net> <20091129013026.GA1355@michelle.cdnetworks.com> <74BFE523-4BB3-4748-98BA-71FBD9829CD5@anduin.net> <alpine.BSF.2.00.0911291427240.80654@fledge.watson.org> <E9B13DDC-1B51-4EFD-95D2-544238BDF3A4@anduin.net> <d763ac660911292347i74caba25h9861a4d9feb63d77@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 30. nov. 2009, at 08.47, Adrian Chadd wrote:

> That URL works for me. So how much traffic is this box handling during
> peak times?

Depends how you define load. It's a storage box (14TB ZFS) with a small =
handful of NFS clients pushing backup data to it .. So lots of traffic =
in bytes/sec, but not many clients.


> I've seen this on the proxy boxes that I've setup. There's a lot of
> data being tied up in socket buffers as well as being routed between
> interfaces (ie, stuff that isn't being intercepted.)  Take a look at
> "netstat -an" when things are locked up; see if there's any sockets
> which have full send/receive queues.

If you're referring to the Send-Q and Recv-Q values, they are zero =
everywhere I can tell.


> I'm going to take a complete stab in the dark here and say this sounds
> a little like a livelock. Ie, something is queuing data and allocating
> mbufs for TX (and something else is generating mbufs - I dunno, packet
> headers?) far faster than the NIC is able to TX them out, and there's
> not enough backpressure on whatever (say, the stuff filling socket
> buffers) to stop the mbuf exhaustion. Again, I've seen this kind of
> crap on proxy boxes.

Not sure if this applies in our case. See the (very) end of this mail =
for some debug/stats output from em1 (the interface currently in use; I =
disabled lagg/lacp to ease debugging).


> See if you have full socket buffers showing up in netstat -an. Have
> you tweaked the socket/TCP send/receive sizes? I typically lock mine
> down to something small (32k-64k for the most part) so I don't hit
> mbuf exhaustion on very busy proxies.

I haven't touched any defaults except the mbuf clusters. What does your =
sysctl.conf look like?


Thanks,
/Eirik


> 2c,
>=20
>=20
>=20
> Adrian
>=20
> 2009/11/30 Eirik =D8verby <ltning@anduin.net>:
>> On 29. nov. 2009, at 15.29, Robert Watson wrote:
>>=20
>>> On Sun, 29 Nov 2009, Eirik =D8verby wrote:
>>>=20
>>>> I just did that (-rxcsum -txcsum -tso), but the numbers still keep =
rising. I'll wait and see if it goes down again, then reboot with those =
values to see how it behaves. But right away it doesn't look too good ..
>>>=20
>>> It would be interesting to know if any of the counters in the output =
of netstat -s grow linearly with the allocation count in netstat -m.  =
Often times leaks are associated with edge cases in the stack (typically =
because if they are in common cases the bug is detected really quickly!) =
-- usually error handling, where in some error case the unwinding fails =
to free an mbuf that it should free.  These are notoriously hard to =
track down, unfortunately, but the stats output (especially where delta =
alloc is linear to delta stat) may inform the situation some more.
>>=20
>> =46rom what I can tell, all that goes up with mbuf usage is =
traffic/packet counts. I can't say I see anything fishy in there.
>>=20
>> =46rom the last few samples in
>> http://anduin.net/~ltning/netstat.log
>> you can see the host stops receiving any packets, but does a few =
retransmits before the session where this script ran timed out.
>>=20
>> /Eirik
>>=20
>> _______________________________________________
>> freebsd-current@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to =
"freebsd-current-unsubscribe@freebsd.org"
>>=20
>=20

em1: link state changed to UP
em1: Adapter hardware address =3D 0xffffff80003be530=20
em1: CTRL =3D 0x140248 RCTL =3D 0x8002=20
em1: Packet buffer =3D Tx=3D20k Rx=3D12k=20
em1: Flow control watermarks high =3D 10240 low =3D 8740
em1: tx_int_delay =3D 66, tx_abs_int_delay =3D 66
em1: rx_int_delay =3D 32, rx_abs_int_delay =3D 66
em1: fifo workaround =3D 0, fifo_reset_count =3D 0
em1: hw tdh =3D 25, hw tdt =3D 25
em1: hw rdh =3D 222, hw rdt =3D 221
em1: Num Tx descriptors avail =3D 256
em1: Tx Descriptors not avail1 =3D 0
em1: Tx Descriptors not avail2 =3D 0
em1: Std mbuf failed =3D 0
em1: Std mbuf cluster failed =3D 0
em1: Driver dropped packets =3D 0
em1: Driver tx dma failure in encap =3D 0
em1: Excessive collisions =3D 0
em1: Sequence errors =3D 0
em1: Defer count =3D 0
em1: Missed Packets =3D 0
em1: Receive No Buffers =3D 0
em1: Receive Length Errors =3D 0
em1: Receive errors =3D 0
em1: Crc errors =3D 0
em1: Alignment errors =3D 0
em1: Collision/Carrier extension errors =3D 0
em1: RX overruns =3D 0
em1: watchdog timeouts =3D 0
em1: RX MSIX IRQ =3D 0 TX MSIX IRQ =3D 0 LINK MSIX IRQ =3D 0
em1: XON Rcvd =3D 0
em1: XON Xmtd =3D 0
em1: XOFF Rcvd =3D 0
em1: XOFF Xmtd =3D 0
em1: Good Packets Rcvd =3D 5704113
em1: Good Packets Xmtd =3D 3617612
em1: TSO Contexts Xmtd =3D 0
em1: TSO Contexts Failed =3D 0




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C3CC7F37-10BE-41DD-96E4-C952C6434ACC>