Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 06 Dec 2009 10:54:15 +0100
From:      Arnaud Houdelette <arnaud.houdelette@tzim.net>
To:        Peter Jeremy <peterjeremy@acm.org>
Cc:        freebsd-current@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: Non-responsive 8.0-RC1 (now 8.0-STABLE)
Message-ID:  <4B1B7F47.7090405@tzim.net>
In-Reply-To: <20091205224826.GA92509@server.vk2pj.dyndns.org>
References:  <20091128212226.GA9841@server.vk2pj.dyndns.org>	<3ABF47F1-86EC-4CF2-9D42-86344D0F455B@exscape.org>	<20091130081330.GA2202@server.vk2pj.dyndns.org> <20091205224826.GA92509@server.vk2pj.dyndns.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Peter Jeremy wrote:
> On 2009-Nov-30 19:13:30 +1100, Peter Jeremy <peter@server.vk2pj.dyndns.org> wrote:
>   
>> On 2009-Nov-29 08:56:55 +0100, Thomas Backman <serenity@exscape.org> wrote:
>>     
>>> On Nov 28, 2009, at 10:22 PM, Peter Jeremy wrote:
>>>
>>>       
>>>> My main server is running 8.0/amd64 from between RC1 and RC2 and I've
>>>> recently had a couple of long-duration hangs on it during which time
>>>> processes doing I/O will stop responding.
>>>>         
> ...
>   
>> It actually "hung" again just after I sent the original mail.  This
>> time I managed to get console access and could check the kernel state.
>> This showed that a number of processes were blocked on ZFS locks.
>> The most commonly reported state was 'tx->tx_quiesce_done_cv)'.
>>     
>
> I've upgraded to 8-STABLE from 30-Nov and the problem is still present,
> even after disabling the boinc processes.
>
> This seems to leave race conditions inside ZFS as the only option.
>
> Has anyone else seen anything like this?
>
>   
I got the same issue since I upgraded to 8.0-RELEASE. I happens during 
high I/O operation such a buildworld. Since I run top in an ssh session, 
I can say that before the hung [zfskern] process shows high CPU usage, 
global system usage is 99%. Sometimes I can get back to normal breaking 
the build with Ctrl-C. Sometimes I don't. If enabled, the watchdog kicks 
in and the machine reboots (else, I just ssh control over it).
The machine is low (512MB) memory, with same tuning as I used in 7.2 
(arc reduced to 60M, device cache to 5M, which gave me a stable machine).
I enabled crashdumps. I can investigate if somebody give me pointers of 
where to look.

Arnaud



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B1B7F47.7090405>