Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Apr 2007 11:53:32 +1000
From:      "Jan Mikkelsen" <janm@transactionware.com>
To:        "'LI Xin'" <delphij@delphij.net>, "'Kostik Belousov'" <kostikbel@gmail.com>
Cc:        freebsd-stable@freebsd.org
Subject:   RE: 6.2-STABLE deadlock?
Message-ID:  <002b01c786dc$87b56e50$0502a8c0@IBMA618C20271E>
In-Reply-To: <462DDB4D.8080507@delphij.net>

next in thread | previous in thread | raw e-mail | index | archive | help
LI Xin wrote:
> Kostik Belousov wrote:
> > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote:
> >> On Tue, Mar 13, 2007 at 02:08:48PM +0000, Adrian Wontroba wrote:
> >>> At work, amoungst my stable of old computers running 
> FreeBSD, I have a
> >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This
> >>> primarily runs Nagios and a small and lightly used MySQL 
> database, along
> >>> with a few inbound FTP transfers per minute. It has a 
> Mylex card based
> >>> disc subsystem, ruling out crash dumps.
> >>>
> >>> At some point during 5.5-STABLE this machine started to 
> occasionally hang ...
> >> Another 6-STABLE (cvsupped on 27/03/07) example, with 
> diagnostics taken
> >> rather sooner after the hang.  Processes with wmesg=ufs 
> feature often in
> >> the ps output.
> >>
> >> http://www.stade.co.uk/crash1/
> > 
> > I would suspect the mlx controller. There is several 
> processes (for instance,
> > 988, 50918) waiting for completion of block read, and 
> processes in the "ufs"
> > states are the result of the lock cascade, IMHO.
> 
> I'm not very sure if this is specific to one disk controller. 
>  Actually
> I got some occasional reports about similar hangs on amd64 6.2-RELEASE
> (slightly patched version) that most of processes stuck in the 'ufs'
> state, under very light load, the box was equipped with amr(4) RAID.
> 
> I was not able to reproduce the problem at my lab, though, it's still
> unknown that how to trigger the livelock :-(  Still need some
> investigate on their production system.

I have seen something similar once, on a machine with an Areca (arcmsr)
controller, running 6.2-RELEASE (with unionfs patches).  Processes stuck in
"ufs", and the machine needed physical intervention to reboot.  I haven't
seen it since.  From memory, it happened during startup of the applications
and jails on the machine.

Jan.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?002b01c786dc$87b56e50$0502a8c0>