From owner-freebsd-net@FreeBSD.ORG Sat Jun 6 07:25:12 2009 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0909C1065670 for ; Sat, 6 Jun 2009 07:25:12 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id D92BC8FC19 for ; Sat, 6 Jun 2009 07:25:11 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 7D50C46B29; Sat, 6 Jun 2009 03:25:11 -0400 (EDT) Date: Sat, 6 Jun 2009 08:25:11 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Barney Cordoba In-Reply-To: <11451.10207.qm@web63902.mail.re1.yahoo.com> Message-ID: References: <11451.10207.qm@web63902.mail.re1.yahoo.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: net@freebsd.org Subject: Re: panic in sbflush X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Jun 2009 07:25:12 -0000 On Fri, 5 Jun 2009, Barney Cordoba wrote: > I'm getting a panic in sbflush where mbcnt is 0 and sb_mb is not empty. Any > clues as to what might cause this? It happening during a load test. sbflush() panics are typically symptoms of bugs elsewhere in the network stack or kernel, often race conditions. In essence, sbflush() is called when a socket is closed and packets have to be drained from the receive socket buffer. During that draining, we sanity check that the cached length of the data in the socket buffer (sb_cc) matches the actual length of data in the buffer. If sb_cc, sb_mb, or sb_mbcnt is non-zero at the end of the function, we panic. Most of the time, it's a driver race condition where an mbuf has been injected into the stack using ifp->if_input(), but the driver has then modified the mbuf after injection (perhaps by setting a length, clearing a pointer, etc). We had a spate of them after we moved to direct dispatch because the timing changed, leading to packets being processed before the return of if_input() rather than "some time later". Once in a while it's a bug in TCP or socket buffer handling, or in some intermediate encapsulation/decapsulation layer along similar lines to the driver race scenario. I think the most recent case I'm aware of was actually a socket buffer bug, but that's fairly unusual in the history of reports of this panic. There is a kernel debugging option to perform run-time sanity checking of the sockbuf structure so that the corruption is found earlier, called "options SOCKBUF_DEBUG". My experience is that it's good for finding deterministic socket buffer corruption bugs, but that it changes the timing significantly so tends to mask narrow race conditions involving "inject the packet and then change it". Hope that helps, Robert N M Watson Computer Laboratory University of Cambridge