From owner-freebsd-stable@FreeBSD.ORG Wed Apr 25 03:43:05 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F38FA16A401 for ; Wed, 25 Apr 2007 03:43:04 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id E027D13C448 for ; Wed, 25 Apr 2007 03:43:04 +0000 (UTC) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 062BB1A4DDA; Tue, 24 Apr 2007 20:43:29 -0700 (PDT) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 12EDC513DD; Tue, 24 Apr 2007 23:43:04 -0400 (EDT) Date: Tue, 24 Apr 2007 23:43:03 -0400 From: Kris Kennaway To: Jan Mikkelsen Message-ID: <20070425034303.GA44054@xor.obsecurity.org> References: <462DDB4D.8080507@delphij.net> <002b01c786dc$87b56e50$0502a8c0@IBMA618C20271E> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <002b01c786dc$87b56e50$0502a8c0@IBMA618C20271E> User-Agent: Mutt/1.4.2.2i Cc: 'Kostik Belousov' , freebsd-stable@freebsd.org, 'LI Xin' Subject: Re: 6.2-STABLE deadlock? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Apr 2007 03:43:05 -0000 On Wed, Apr 25, 2007 at 11:53:32AM +1000, Jan Mikkelsen wrote: > LI Xin wrote: > > Kostik Belousov wrote: > > > On Mon, Apr 23, 2007 at 03:56:32AM +0100, Adrian Wontroba wrote: > > >> On Tue, Mar 13, 2007 at 02:08:48PM +0000, Adrian Wontroba wrote: > > >>> At work, amoungst my stable of old computers running > > FreeBSD, I have a > > >>> Fujitsu M800 - a 4 Zeon SMP processor with 4 GB of memory. This > > >>> primarily runs Nagios and a small and lightly used MySQL > > database, along > > >>> with a few inbound FTP transfers per minute. It has a > > Mylex card based > > >>> disc subsystem, ruling out crash dumps. > > >>> > > >>> At some point during 5.5-STABLE this machine started to > > occasionally hang ... > > >> Another 6-STABLE (cvsupped on 27/03/07) example, with > > diagnostics taken > > >> rather sooner after the hang. Processes with wmesg=ufs > > feature often in > > >> the ps output. > > >> > > >> http://www.stade.co.uk/crash1/ > > > > > > I would suspect the mlx controller. There is several > > processes (for instance, > > > 988, 50918) waiting for completion of block read, and > > processes in the "ufs" > > > states are the result of the lock cascade, IMHO. > > > > I'm not very sure if this is specific to one disk controller. > > Actually > > I got some occasional reports about similar hangs on amd64 6.2-RELEASE > > (slightly patched version) that most of processes stuck in the 'ufs' > > state, under very light load, the box was equipped with amr(4) RAID. > > > > I was not able to reproduce the problem at my lab, though, it's still > > unknown that how to trigger the livelock :-( Still need some > > investigate on their production system. > > I have seen something similar once, on a machine with an Areca (arcmsr) > controller, running 6.2-RELEASE (with unionfs patches). Processes stuck in > "ufs", and the machine needed physical intervention to reboot. I haven't > seen it since. From memory, it happened during startup of the applications > and jails on the machine. Sounds like one of the known unionfs bugs. Kris