From owner-freebsd-current@FreeBSD.ORG  Sat May 16 03:13:33 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A12A21065677
	for <freebsd-current@freebsd.org>; Sat, 16 May 2009 03:13:33 +0000 (UTC)
	(envelope-from mcdouga9@egr.msu.edu)
Received: from mx.egr.msu.edu (surfnturf.egr.msu.edu [35.9.37.164])
	by mx1.freebsd.org (Postfix) with ESMTP id 6F0E78FC2C
	for <freebsd-current@freebsd.org>; Sat, 16 May 2009 03:13:33 +0000 (UTC)
	(envelope-from mcdouga9@egr.msu.edu)
Received: from localhost (localhost [127.0.0.1])
	by mx.egr.msu.edu (Postfix) with ESMTP id BBD0371F273;
	Fri, 15 May 2009 23:13:32 -0400 (EDT)
X-Virus-Scanned: amavisd-new at egr.msu.edu
Received: from mx.egr.msu.edu ([127.0.0.1])
	by localhost (surfnturf.egr.msu.edu [127.0.0.1]) (amavisd-new,
	port 10024)
	with ESMTP id Ni6GnWw7DpLd; Fri, 15 May 2009 23:13:32 -0400 (EDT)
Received: from localhost (daemon.egr.msu.edu [35.9.44.65])
	by mx.egr.msu.edu (Postfix) with ESMTP id 6403A71F26F;
	Fri, 15 May 2009 23:13:32 -0400 (EDT)
Received: by localhost (Postfix, from userid 21281)
	id 618B6DC3; Fri, 15 May 2009 23:13:32 -0400 (EDT)
Date: Fri, 15 May 2009 23:13:32 -0400
From: Adam McDougall <mcdouga9@egr.msu.edu>
To: Ben Kelly <ben@wanderview.com>
Message-ID: <20090516031332.GG82547@egr.msu.edu>
References: <08D7DC2A-68BE-47B6-8D5D-5DE6B48F87E5@wanderview.com>
	<AC3C4C3F-40C6-4AF9-BAF3-2C4D1E444839@wanderview.com>
	<ed91d4a80904142135n429dea52o672abf51116fa707@mail.gmail.com>
	<ed91d4a80904241816r28531a04r2dc70fa8960d430e@mail.gmail.com>
	<bc2d970904241947r50576efbgc93164a9e4dd297d@mail.gmail.com>
	<ed91d4a80904242059n3642a40aud55df6d1b6a1695@mail.gmail.com>
	<FC83DB1E-6C08-4BD4-8BC9-437D714FEE9E@wanderview.com>
	<ed91d4a80904271839l49420c8rbcfd52dd6e72eb83@mail.gmail.com>
	<ed91d4a80904281111q3b9a3c45vc9fcf129dde8c10d@mail.gmail.com>
	<F86D3461-3ABD-4A56-B9A6-36857364DF4B@wanderview.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <F86D3461-3ABD-4A56-B9A6-36857364DF4B@wanderview.com>
User-Agent: Mutt/1.5.19 (2009-01-05)
Cc: freebsd-current@freebsd.org, Artem Belevich <fbsdlist@src.cx>
Subject: Re: [patch] zfs livelock and thread priorities
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 16 May 2009 03:13:34 -0000

On Tue, Apr 28, 2009 at 04:52:23PM -0400, Ben Kelly wrote:

  On Apr 28, 2009, at 2:11 PM, Artem Belevich wrote:
  > My system had eventually deadlocked overnight, though it took much
  > longer than before to reach that point.
  >
  > In the end I've got many many processes sleeping in zio_wait with no
  > disk activity whatsoever.
  > I'm not sure if that's the same issue or not.
  >
  > Here are stack traces for all processes -- http://pastebin.com/f364e1452
  > I've got the core saved, so if you want me to dig out some more info,
  > let me know if/how I could help.
  
  It looks like there is a possible deadlock between zfs_zget() and  
  zfs_zinactive().  They both acquire a lock via ZFS_OBJ_HOLD_ENTER().   
  The zfs_zinactive() path can get called indirectly from within  
  zio_done().  The zfs_zget() can in turn block waiting for zio_done()'s  
  completion while holding the object lock.
  
  The following patch might help:
  
     http://www.wanderview.com/svn/public/misc/zfs/zfs_zinactive_deadlock.diff
  
  This simply bails out of the inactive processing if the object lock is  
  already held.  I'm not sure if this is 100% correct or not as it  
  cannot verify there are references to the vnode.  I also tried  
  executing the zfs_zinactive() logic in a taskqueue to avoid the  
  deadlock, but that caused other deadlocks to occur.
  
  Hope that helps.
  
  - Ben

Its my understanding that the deadlock was fixed in -current,
how does that affect the usefulness of the thread priorities
patch?  Should I continue testing it or is it effectively a 
NOOP now?  

Also, I've been doing some fairly intense testing of zfs in 
recent -current and I am tracking down a situation where 
performance gets worse but I think I found a workaround.
I am gathering more data regarding the cause, workaround,
symptoms, and originating commit and will post about it soon.