Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Dec 2009 18:09:46 +0100
From:      Arnaud Houdelette <arnaud.houdelette@tzim.net>
To:        Ben Kelly <ben@wanderview.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Possible ZFS livelock or SCHED_ULE bug ?
Message-ID:  <4B29145A.4080601@tzim.net>
In-Reply-To: <F864C01F-3E81-461C-9E90-964608F189BC@wanderview.com>
References:  <4B290515.5080909@tzim.net> <F864C01F-3E81-461C-9E90-964608F189BC@wanderview.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Ben Kelly wrote:
> On Dec 16, 2009, at 11:04 AM, Arnaud Houdelette wrote:
>
>   
>> Hi all !
>> I got a UniProcessor AMD64 box, with 512 MB ram with 2 ZFS pools as a home-NAS.
>>
>> I got some IO issues since I moved from 7.2 to 8.0.
>> With a GENERIC kernel (or a stripped down one),  during high IO activity (as a make buildword can cause), I encounter random hangs or deadlocks.
>> top show system CPU usage at 99%, the most CPU using process being [zfskern] ( {txg_thread_enter} if I switch to thread view).
>> The box still respond to ping. Current processes can still run, but I can't run new ones.
>> Sometimes, I can return to normal by Ctrl-C-ing the buildworld (or other operation), sometimes I can't, I got to reboot the box.
>>
>> The Issue seemed to become less frequent with 8.0-stable instead of 8.0-RELEASE, but still present (I get approximately 75% chance of hang with a buildworld).
>> I got the hang with Prefetch enabled or disabled. Idem for ZIL.
>>
>> I tried to enable kernel dumps, but the box hangs saving the dump (root is on ZFS) or when starting kdbg on it.
>> I recompiled kernel with SCHED_4BSD, and it seems I can't reproduce the hang.
>>
>> What do you think ?
>> Did I misconfigured something ?
>>     
>
> This sounds similar to something I ran into on CURRENT last year:
>
>   http://docs.freebsd.org/cgi/getmsg.cgi?fetch=832196+0+archive/2009/freebsd-current/20090322.freebsd-current
>
> The immediate problem was a priority inversion problem between the txg_thread_enter threads and the spa_zio threads.  This should be solved (or at least mitigated) on 8.0 now that these threads have explicit priorities set.  Can you check to see what priorities these threads are at on your machine?  They should have priorities something like -8 for txg_thread_enter and -16 for spa_zio.
>
> - Ben
>   

As far as I can tell, this is the priorities that I see on my machine.
I'm doing another test. This once with ULE but without options SMP set.
I'm currently building world, and so far, I did not encountered any 
hang. (and the system seems more responsive that with 4BSD). I'll keep 
testing and report...

Arnaud



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B29145A.4080601>