From owner-freebsd-net@FreeBSD.ORG  Sat Jun  6 07:25:12 2009
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0909C1065670
	for <net@freebsd.org>; Sat,  6 Jun 2009 07:25:12 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id D92BC8FC19
	for <net@freebsd.org>; Sat,  6 Jun 2009 07:25:11 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id 7D50C46B29;
	Sat,  6 Jun 2009 03:25:11 -0400 (EDT)
Date: Sat, 6 Jun 2009 08:25:11 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Barney Cordoba <barney_cordoba@yahoo.com>
In-Reply-To: <11451.10207.qm@web63902.mail.re1.yahoo.com>
Message-ID: <alpine.BSF.2.00.0906060819020.41475@fledge.watson.org>
References: <11451.10207.qm@web63902.mail.re1.yahoo.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: net@freebsd.org
Subject: Re: panic in sbflush
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 06 Jun 2009 07:25:12 -0000

On Fri, 5 Jun 2009, Barney Cordoba wrote:

> I'm getting a panic in sbflush where mbcnt is 0 and sb_mb is not empty. Any 
> clues as to what might cause this? It happening during a load test.

sbflush() panics are typically symptoms of bugs elsewhere in the network stack 
or kernel, often race conditions.  In essence, sbflush() is called when a 
socket is closed and packets have to be drained from the receive socket 
buffer.  During that draining, we sanity check that the cached length of the 
data in the socket buffer (sb_cc) matches the actual length of data in the 
buffer.  If sb_cc, sb_mb, or sb_mbcnt is non-zero at the end of the function, 
we panic.

Most of the time, it's a driver race condition where an mbuf has been injected 
into the stack using ifp->if_input(), but the driver has then modified the 
mbuf after injection (perhaps by setting a length, clearing a pointer, etc). 
We had a spate of them after we moved to direct dispatch because the timing 
changed, leading to packets being processed before the return of if_input() 
rather than "some time later".

Once in a while it's a bug in TCP or socket buffer handling, or in some 
intermediate encapsulation/decapsulation layer along similar lines to the 
driver race scenario.  I think the most recent case I'm aware of was actually 
a socket buffer bug, but that's fairly unusual in the history of reports of 
this panic.

There is a kernel debugging option to perform run-time sanity checking of the 
sockbuf structure so that the corruption is found earlier, called "options 
SOCKBUF_DEBUG".  My experience is that it's good for finding deterministic 
socket buffer corruption bugs, but that it changes the timing significantly so 
tends to mask narrow race conditions involving "inject the packet and then 
change it".

Hope that helps,

Robert N M Watson
Computer Laboratory
University of Cambridge