From owner-freebsd-stable@FreeBSD.ORG Thu Jan 29 22:38:57 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CF0641065688 for ; Thu, 29 Jan 2009 22:38:57 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id A90718FC19 for ; Thu, 29 Jan 2009 22:38:57 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 3474B46B49; Thu, 29 Jan 2009 17:38:57 -0500 (EST) Date: Thu, 29 Jan 2009 22:38:57 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: freebsd-stable@freebsd.org In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Pete French Subject: Re: Big problems with 7.1 locking up :-( X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jan 2009 22:38:58 -0000 On Fri, 9 Jan 2009, Pete French wrote: > I have a number of HP 1U servers, all of which were running 7.0 perfectly > happily. I have been testing 7.1 in it's various incarnations for the last > couple of months on our test server and it has performed perfectly. > > So the last two days I have been round upgrading all our servers, knowing > that I had run the system stably on identical hardware for some time. For those following this other than Pete, who I've been in private correspondence with: it seems that he is running into two different deadlocks in the routing code. One of them (at least) is triggered by a lock order problem relating to the processing of ICMP redirects -- uncommon in most configurations, but quite a few on his network, which triggers quickly under load. Kip Macy has corrected at least one (both?) problems in head, and plans to MFC the fixes in the near future. We'll follow up further once the fixes are merged, and if any further problems transpire. Robert N M Watson Computer Laboratory University of Cambridge > > Since then I have starte seeing machines lock up. This always happens under > heavy disc load. When I bring the machine back up then sometimes it fails > to fsck due to a partialy truncated inode. The locksup appear to > be disc related - on my mysql msater machine it will come back up with > files somewhat shorted than those which ahve aready been transmitted to > the slave (i.e. some data was in memory, and claimed to have been written > to the drive, but never made it onto the disc). > > The only time I have seen anything useful on the screen was during one lockup > where I got a message about a spin lock being held too long and some > comment in parentheses about it being a turnstile lock. > > Help! :-( > > I am now downgrading all the machine to 7.0 as fast as I can - though the > machine I am trying to compile it on has locked up once during the compile > so I havent got anywhere so far. > > The machines are HP Proliant DL360 G5s - they have an embedded P400i > RAID controller with a pair of mirrored drives connected. Each one has > both ethernets connected, bundled using lagg and LACP. > > Advice ? > > -pete. > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" >