Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 09 Jan 2009 08:49:42 -0600
From:      Guy Helmer <ghelmer@palisadesys.com>
To:        Pete French <petefrench@ticketswitch.com>, freebsd-stable@freebsd.org
Subject:   Re: Big problems with 7.1 locking up :-(
Message-ID:  <49676406.9050902@palisadesys.com>
In-Reply-To: <E1LL6dg-0007CN-DI@dilbert.ticketswitch.com>
References:  <E1LL6dg-0007CN-DI@dilbert.ticketswitch.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Pete French wrote:
> I have a number of HP 1U servers, all of which were running 7.0
> perfectly happily. I have been testing 7.1 in it's various incarnations
> for the last couple of months on our test server and it has performed
> perfectly.
>
> So the last two days I have been round upgrading all our servers, knowing
> that I had run the system stably on identical hardware for some time.
>
> Since then I have starte seeing machines lock up. This always happens under
> heavy disc load. When I bring the machine back up then sometimes it fails
> to fsck due to a partialy truncated inode. The locksup appear to
> be disc related - on my mysql msater machine it will come back up with
> files somewhat shorted than  those which ahve aready been transmitted to
> the slave (i.e. some data was in memory, and claimed to have been written
> to the drive, but never made it onto the disc).
>
> The only time I have seen anything useful on the screen was during one lockup
> where I got a message about a spin lock being held too long and some
> comment in parentheses about it being a turnstile lock.
>
> Help! :-(
>
> I am now downgrading all the machine to 7.0 as fast as I can - though the
> machine I am trying to compile it on has locked up once during the compile
> so I havent got anywhere so far.
>
> The machines are HP Proliant DL360 G5s - they have an embedded P400i
> RAID controller with a pair of mirrored drives connected. Each one has
> both ethernets connected, bundled using lagg and LACP.
>
>   
I can't tell whether my situation is related, but I am seeing lockups on 
SMP Supermicro servers with both older (NetBurst-ish) and current Xeon 
CPUs.  I have been dropping into the kernel debugger and getting lock 
information and process backtraces, but so far nothing has been 
conclusively identified.  I think the issue I'm seeing was introduced 
sometime between October 2 and November 24 in the RELENG_7 branch, and I 
suppose the next step is to do a binary search for the offending change.

Guy

-- 
Guy Helmer, Ph.D.
Chief System Architect
Palisade Systems, Inc.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49676406.9050902>