From owner-freebsd-fs@FreeBSD.ORG Fri Jun 10 18:07:48 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ADE36106566B for ; Fri, 10 Jun 2011 18:07:48 +0000 (UTC) (envelope-from gibbs@FreeBSD.org) Received: from aslan.scsiguy.com (www.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 828148FC14 for ; Fri, 10 Jun 2011 18:07:48 +0000 (UTC) Received: from Justins-MacBook-Pro.local (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.4/8.14.4) with ESMTP id p5AHWwKh061453 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Fri, 10 Jun 2011 11:32:59 -0600 (MDT) (envelope-from gibbs@FreeBSD.org) Message-ID: <4DF25544.3020301@FreeBSD.org> Date: Fri, 10 Jun 2011 11:32:52 -0600 From: "Justin T. Gibbs" Organization: The FreeBSD Project User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: fs@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (aslan.scsiguy.com [70.89.174.89]); Fri, 10 Jun 2011 11:32:59 -0600 (MDT) Cc: Subject: Drop of spa_namespace lock in vdev_geom.c X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: gibbs@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jun 2011 18:07:48 -0000 Dropping and reacquiring the spa_namespace lock in vdev_geom_open() creates a lock order reversal with the spa_config locks. As the spa_config locks are not standard mutexes, witness will not warn about this issue. I only noticed this problem when debugging a ZFS deadlock. The deadlock can be triggered anytime that there are multiple insert/remove processes going on (e.g. vdev orphan processing while a fault management daemon is onlining a replacement device for some other vdev). I haven't noticed any issues with just holding the namespace lock for the duration of the open. Does anyone know why this lock drop was added in v28? Thanks, Justin