From owner-freebsd-net@FreeBSD.ORG  Mon Aug 23 10:18:00 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 84AC010656A6
	for <freebsd-net@freebsd.org>; Mon, 23 Aug 2010 10:18:00 +0000 (UTC)
	(envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
	by mx1.freebsd.org (Postfix) with ESMTP id E74A48FC1F
	for <freebsd-net@freebsd.org>; Mon, 23 Aug 2010 10:17:59 +0000 (UTC)
Received: (qmail 34549 invoked from network); 23 Aug 2010 10:17:07 -0000
Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1])
	(envelope-sender <andre@freebsd.org>)
	by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
	for <adrian.chadd@gmail.com>; 23 Aug 2010 10:17:07 -0000
Message-ID: <4C724AD9.5020000@freebsd.org>
Date: Mon, 23 Aug 2010 12:18:01 +0200
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
	rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2
MIME-Version: 1.0
To: adrian.chadd@gmail.com
References: <AANLkTikrbCFHz-CnuYcgH2JzpeH5hob0Aa2y5dwn3Hvv@mail.gmail.com>	<AANLkTikYMU=wML_z=HDnkUF1PGYMVa1q-QWTrkxD+7EP@mail.gmail.com>	<20100822222746.GC6013@michelle.cdnetworks.com>
	<AANLkTi=t+nG8isp1nf2aBec+FwomApNt0NBPO8LqZ+=9@mail.gmail.com>
In-Reply-To: <AANLkTi=t+nG8isp1nf2aBec+FwomApNt0NBPO8LqZ+=9@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: pyunyh@gmail.com, freebsd-net@freebsd.org
Subject: Re: 8.0-RELEASE-p3: 4k jumbo mbuf cluster exhaustion
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Aug 2010 10:18:00 -0000

On 23.08.2010 11:26, Adrian Chadd wrote:
> On 23 August 2010 06:27, Pyun YongHyeon<pyunyh@gmail.com>  wrote:
>
>> I recall there was SIOCSIFCAP ioctl handling bug in bce(4) on 8.0 so
>> it might also disable IFCAP_TSO4/IFCAP_TXCSUM/IFCAP_RXCSUM when yo
>> disabled RX checksum offloading. But I can't explain how checksum
>> offloading could be related with the growth of 4k jumbo buffers.
>
> Neither can I!
>
> I'm trying to come up with a reproduction method that doesn't involve
> "put box on the internet, push clients through it, wait."

Network drivers use 2k sized mbuf clusters on receive.  So the problem
doesn't seem to be RX related.

The function that is called on a socket write is sosend_generic() which
makes use of m_getm2().  This function allocates mbuf chains with the
tightest packing it can achieve.  It will make use 4k (page size) mbufs
as much as it can.  This is where they come from.

It seems the 4k clusters do not get freed back to the pool after they've
been sent by the NIC and dropped from the socket buffer after the ACK has
arrived.  The leak must occur in one of these two places.  The socket
buffer is unlikely as it would affect not just you but everyone else too.
Thus the mbuf freeing after DMA/tx in the bce(4) driver is the prime suspect.

-- 
Andre