From owner-freebsd-current@FreeBSD.ORG Mon Nov 10 10:32:47 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DFFA216A4CE for ; Mon, 10 Nov 2003 10:32:47 -0800 (PST) Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id B357443F3F for ; Mon, 10 Nov 2003 10:32:46 -0800 (PST) (envelope-from robert@fledge.watson.org) Received: from fledge.watson.org (localhost [127.0.0.1]) by fledge.watson.org (8.12.9p2/8.12.9) with ESMTP id hAAIV2Mg059603; Mon, 10 Nov 2003 13:31:02 -0500 (EST) (envelope-from robert@fledge.watson.org) Received: from localhost (robert@localhost)hAAIV1vu059600; Mon, 10 Nov 2003 13:31:02 -0500 (EST) (envelope-from robert@fledge.watson.org) Date: Mon, 10 Nov 2003 13:31:01 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org To: Matt Smith In-Reply-To: <3FAFC08D.30301@xtaz.co.uk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Soren Schmidt cc: current@freebsd.org Subject: Re: Still getting NFS client locking up X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Nov 2003 18:32:48 -0000 On Mon, 10 Nov 2003, Matt Smith wrote: > I can certainly spend some time trying to get some proper debug based on > what you have said in your email. I shall look into setting up a serial > console etc. > > In the meantime another piece of information which might be helpful is > this. Looking at the wtmp to see when I rebuilt my world/kernel I can > see this: > > reboot ~ Tue Oct 21 20:44 > reboot ~ Wed Oct 15 19:36 > > (These times are in BST which is +5 hours from east coast US). > > On the Oct 15th kernel NFS was working perfectly (and before that). From > the Oct 21st kernel it has always locked up in this way. So something > between those two dates was commited which broke this for us. Another > way of me debugging this I guess is to backtrack my world to each date > in between systematically and find the exact date it breaks and look at > the commits. Hmm. The one other thing that might be worth trying, and this is pretty time-consuming, is attempting to narrow down the threshold kernel change that caused the failures to start. Typically, this is done using a binary search (i.e., find two dates -- one that the kernel works, the other that it doesn't -- split the difference, repeat until narrowed down to a range of commits that can be individually inspected). This way we could try to identify some suspect changes that could be backed out locally individually to narrow it down. The likely categories of commits that might be worth looking at probably include: (1) Changes specifically to the network drivers that you're using. (2) Changes to the network stack, especially relating to locking and timeouts. (3) Changes to the NFS client and server code. (4) Changes in general to VFS and buffer cache locking. We've had a lot of commits in all of these categories, so narrowing it down would be a useful way to help figure it out... Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert@fledge.watson.org Network Associates Laboratories