From owner-freebsd-current@FreeBSD.ORG Sat May 16 03:13:33 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A12A21065677 for ; Sat, 16 May 2009 03:13:33 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from mx.egr.msu.edu (surfnturf.egr.msu.edu [35.9.37.164]) by mx1.freebsd.org (Postfix) with ESMTP id 6F0E78FC2C for ; Sat, 16 May 2009 03:13:33 +0000 (UTC) (envelope-from mcdouga9@egr.msu.edu) Received: from localhost (localhost [127.0.0.1]) by mx.egr.msu.edu (Postfix) with ESMTP id BBD0371F273; Fri, 15 May 2009 23:13:32 -0400 (EDT) X-Virus-Scanned: amavisd-new at egr.msu.edu Received: from mx.egr.msu.edu ([127.0.0.1]) by localhost (surfnturf.egr.msu.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ni6GnWw7DpLd; Fri, 15 May 2009 23:13:32 -0400 (EDT) Received: from localhost (daemon.egr.msu.edu [35.9.44.65]) by mx.egr.msu.edu (Postfix) with ESMTP id 6403A71F26F; Fri, 15 May 2009 23:13:32 -0400 (EDT) Received: by localhost (Postfix, from userid 21281) id 618B6DC3; Fri, 15 May 2009 23:13:32 -0400 (EDT) Date: Fri, 15 May 2009 23:13:32 -0400 From: Adam McDougall To: Ben Kelly Message-ID: <20090516031332.GG82547@egr.msu.edu> References: <08D7DC2A-68BE-47B6-8D5D-5DE6B48F87E5@wanderview.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.19 (2009-01-05) Cc: freebsd-current@freebsd.org, Artem Belevich Subject: Re: [patch] zfs livelock and thread priorities X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 16 May 2009 03:13:34 -0000 On Tue, Apr 28, 2009 at 04:52:23PM -0400, Ben Kelly wrote: On Apr 28, 2009, at 2:11 PM, Artem Belevich wrote: > My system had eventually deadlocked overnight, though it took much > longer than before to reach that point. > > In the end I've got many many processes sleeping in zio_wait with no > disk activity whatsoever. > I'm not sure if that's the same issue or not. > > Here are stack traces for all processes -- http://pastebin.com/f364e1452 > I've got the core saved, so if you want me to dig out some more info, > let me know if/how I could help. It looks like there is a possible deadlock between zfs_zget() and zfs_zinactive(). They both acquire a lock via ZFS_OBJ_HOLD_ENTER(). The zfs_zinactive() path can get called indirectly from within zio_done(). The zfs_zget() can in turn block waiting for zio_done()'s completion while holding the object lock. The following patch might help: http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff This simply bails out of the inactive processing if the object lock is already held. I'm not sure if this is 100% correct or not as it cannot verify there are references to the vnode. I also tried executing the zfs_zinactive() logic in a taskqueue to avoid the deadlock, but that caused other deadlocks to occur. Hope that helps. - Ben Its my understanding that the deadlock was fixed in -current, how does that affect the usefulness of the thread priorities patch? Should I continue testing it or is it effectively a NOOP now? Also, I've been doing some fairly intense testing of zfs in recent -current and I am tracking down a situation where performance gets worse but I think I found a workaround. I am gathering more data regarding the cause, workaround, symptoms, and originating commit and will post about it soon.