From owner-freebsd-stable@FreeBSD.ORG Sat Oct 29 18:43:41 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DF8BD16A41F for ; Sat, 29 Oct 2005 18:43:41 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 61C8843D48 for ; Sat, 29 Oct 2005 18:43:41 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id C4A0246BB7; Sat, 29 Oct 2005 14:43:40 -0400 (EDT) Date: Sat, 29 Oct 2005 19:43:40 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Carl Makin In-Reply-To: <436168E9.8080907@xena.ipaustralia.gov.au> Message-ID: <20051029193927.C20147@fledge.watson.org> References: <436168E9.8080907@xena.ipaustralia.gov.au> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org Subject: Re: 5.x/6.x network stability X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Oct 2005 18:43:42 -0000 On Fri, 28 Oct 2005, Carl Makin wrote: > I've been having a heap of trouble with the primary network interface on > a box that was running 5.4 and recently upgraded to 6.0-Beta5 where the > interface would just go dead. Nothing in ifconfig or syslog or dmesg > would indicate a problem, but nothing would go in or out. The only way > to fix it was reboot. > > A week ago, after searching the mailing lists I realised it might be the > fact that I was using Netatalk and that might not be MP safe so I set > debug.mpsafenet="0" in /boot/loader.conf and the box has been stable > ever since. > > Is anyone looking at the kernel Netatalk code? Is this likely to be the > real reason for the problem? I've not seen any reports of problems, but have had my hands in it recently. I'm happy to help try and debug the issues, but my preference (if possible) would be to do this on 6.x and then backport fixes to 5.x. While the netatalk code does see testing, it's not all that widely used, and so it's possible there are lurking issues. netatalk is, in theory, MPSAFE, but there could be lasting race conditions. debug.mpsafenet puts Giant back over the stack, but also substantially changes the timing, so a race condition in a device driver or the socket code could also be indicated. Could you: - Submit a PR describing the details. - Include output from dmesg, ifconfig, and other information you might thing that would be useful. Indicate which interface is the one that is hanging. - Compile the kernel with INVARIANTS, INVARIANT_SUPPORT, WITNESS, DDB, and BREAK_TO_DEBUGGER. See if you get any debugging warnings around when the hang occurs. Note: these options have a large performance impact. - Once the interfae is dead, can you use it for IP traffic? - Once the interface is dead, if you run tcpdump on it, do you see traffic? - Once the interface is dead, if you generate traffic, do other hosts see it? - If you generate traffic, does tcpdump see your own traffic? Thanks, Robert N M Watson