From owner-freebsd-current@FreeBSD.ORG Thu Jun 18 11:50:08 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E962106564A for ; Thu, 18 Jun 2009 11:50:08 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from ch-smtp01.sth.basefarm.net (ch-smtp01.sth.basefarm.net [80.76.149.212]) by mx1.freebsd.org (Postfix) with ESMTP id 202598FC13 for ; Thu, 18 Jun 2009 11:50:07 +0000 (UTC) (envelope-from serenity@exscape.org) Received: from c83-253-252-234.bredband.comhem.se ([83.253.252.234]:37948 helo=mx.exscape.org) by ch-smtp01.sth.basefarm.net with esmtp (Exim 4.69) (envelope-from ) id 1MHG7p-0008Rb-3S; Thu, 18 Jun 2009 13:49:51 +0200 Received: from [192.168.1.5] (macbookpro [192.168.1.5]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx.exscape.org (Postfix) with ESMTPSA id 6DFDC6B161; Thu, 18 Jun 2009 13:49:47 +0200 (CEST) Message-Id: <993B7B5B-1B6B-48A5-8425-6A1D071335A9@exscape.org> From: Thomas Backman To: Artem Belevich In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Date: Thu, 18 Jun 2009 13:49:46 +0200 References: X-Mailer: Apple Mail (2.935.3) X-Originating-IP: 83.253.252.234 X-Scan-Result: No virus found in message 1MHG7p-0008Rb-3S. X-Scan-Signature: ch-smtp01.sth.basefarm.net 1MHG7p-0008Rb-3S 4704e28427d1bfb6cd99c4517d6480e2 Cc: freebsd-current@freebsd.org Subject: Re: ZFS : panic("sleeping thread") X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jun 2009 11:50:08 -0000 On May 27, 2009, at 07:58 PM, Artem Belevich wrote: > Hi, > > While recent ZFS improvements got rid of random hangs I used to see, > there's still one problem that I keep running into -- panic in ZFS > under heavy load. I can reproduce it by doing a build with -j16 in a > jail running i386 binaries on -CURRENT/amd64 running on a box with > quad-core CPU. It takes a while to reproduce, but it usually shows up > within couple of hours. > > Sleeping thread (tid 100606, pid 32147) owns a non-sleepable lock > sched_switch() at sched_switch+0xed > mi_switch() at mi_switch+0x16f > sleepq_wait() at sleepq_wait+0x42 > _sx_xlock_hard() at _sx_xlock_hard+0x1f0 > _sx_xlock() at _sx_xlock+0x4e > rrw_exit() at rrw_exit+0x1d > zfs_freebsd_getattr() at zfs_freebsd_getattr+0x2be > VOP_GETATTR_APV() at VOP_GETATTR_APV+0x44 > filt_vfsread() at filt_vfsread+0x51 > knote() at knote+0xc2 > VOP_WRITE_APV() at VOP_WRITE_APV+0x11f > vn_write() at vn_write+0x279 > dofilewrite() at dofilewrite+0x85 > kern_writev() at kern_writev+0x60 > write() at write+0x54 > ia32_syscall() at ia32_syscall+0x236 > Xint0x80_syscall() at Xint0x80_syscall+0x85 > --- syscall (4, FreeBSD ELF32, write), rip = 0x78162153, rsp = > 0xffff945c, rbp = 0xffff9478 --- > > It appears that locking within ZFS conflicts with vnode locking. The > back-trace is always the same. > > For now, I've applied following patch to disable the panic, but it > would be good if someone familiar with VFS locking in FreeBSD could > take a look. > If you need any additional info, let me know. > > Thanks, > --Artem > > diff -r 930d975c8103 src/sys/kern/subr_turnstile.c > --- a/sys/kern/subr_turnstile.c Fri Dec 05 16:12:43 2008 -0800 > +++ b/sys/kern/subr_turnstile.c Fri Dec 12 14:31:16 2008 -0800 > @@ -219,7 +219,10 @@ > #ifdef DDB > db_trace_thread(td, -1); > #endif > - panic("sleeping thread"); > + /* Don't propagate priority to a sleeping > thread. */ > + thread_unlock(td); > + return; > + // panic("sleeping thread"); > } > > /* Anyone have any updates on this? I just got a "sleeping thread" panic in ZFS after doing a zfs rollback. Unfortunately, "panic" in the debugger resulted in "dump device too small" (despite being RAM-sized) so I don't have a BT... However the BT I got in the debugger was *not* the same as yours. There was no _sx_xlock in it, but that's pretty much all I know about it. :( Regards, Thomas