From owner-freebsd-net@FreeBSD.ORG Mon Aug 23 19:45:19 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2C7010656A6 for ; Mon, 23 Aug 2010 19:45:19 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 3EFF28FC18 for ; Mon, 23 Aug 2010 19:45:18 +0000 (UTC) Received: (qmail 41585 invoked from network); 23 Aug 2010 19:44:22 -0000 Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 23 Aug 2010 19:44:22 -0000 Message-ID: <4C72CFD0.2000005@freebsd.org> Date: Mon, 23 Aug 2010 21:45:20 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2 MIME-Version: 1.0 To: pyunyh@gmail.com References: <20100822222746.GC6013@michelle.cdnetworks.com> <4C724AD9.5020000@freebsd.org> <20100823175220.GB1116@michelle.cdnetworks.com> <4C72C622.2070302@freebsd.org> <20100823191634.GE1116@michelle.cdnetworks.com> In-Reply-To: <20100823191634.GE1116@michelle.cdnetworks.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: adrian.chadd@gmail.com, freebsd-net@freebsd.org Subject: Re: 8.0-RELEASE-p3: 4k jumbo mbuf cluster exhaustion X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2010 19:45:19 -0000 On 23.08.2010 21:16, Pyun YongHyeon wrote: > On Mon, Aug 23, 2010 at 09:04:02PM +0200, Andre Oppermann wrote: >> On 23.08.2010 19:52, Pyun YongHyeon wrote: >>> On Mon, Aug 23, 2010 at 12:18:01PM +0200, Andre Oppermann wrote: >>>> The function that is called on a socket write is sosend_generic() which >>>> makes use of m_getm2(). This function allocates mbuf chains with the >>>> tightest packing it can achieve. It will make use 4k (page size) mbufs >>>> as much as it can. This is where they come from. >>>> >>>> It seems the 4k clusters do not get freed back to the pool after they've >>>> been sent by the NIC and dropped from the socket buffer after the ACK has >>>> arrived. The leak must occur in one of these two places. The socket >>>> buffer is unlikely as it would affect not just you but everyone else too. >>>> Thus the mbuf freeing after DMA/tx in the bce(4) driver is the prime >>>> suspect. >>>> >>> >>> I know bce(4) has a couple of bug in TX path(wrong dma tag, lack of >>> bus_dmamap_sync(9) etc) but this is the same code path with/without >>> TX checksum offloading. This is one of reason why I still do not >>> understand what's really happening here. TX checksum offloading may >>> introduce additional frame processing time to fill internal FIFO to >>> compute checksum before transmitting the frame to wire such that it >>> can change timing of TX path. This timing change might trigger the >>> TX path bug. It's just vague guessing though. >> >> Had a chat with Claudio@OpenBSD and he said that the bce(4) DMA engine >> can only access the first 1GB of physical RAM and has to use bounce >> buffers all the time. Maybe this is related. >> > > Really? I don't remember I saw such a DMA address space limitation > in data sheet. And I don't think Broadcom made such a horrible > thing for controllers targeted for servers. The only limitation I > know is BCM5708 is not able to handle DMA addresses greater than > 40bits so bce(4) limits the DMA address space in DMA tag creation. Oops... OpenBSD bce(4) != FreeBSD bce(4). The former is for BCM440x chips the latter for BCM57xx. -- Andre