From owner-freebsd-fs@FreeBSD.ORG Wed Jul 3 19:24:17 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 50283F68; Wed, 3 Jul 2013 19:24:17 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [IPv6:2001:470:1:117::25]) by mx1.freebsd.org (Postfix) with ESMTP id 2FA351124; Wed, 3 Jul 2013 19:24:17 +0000 (UTC) Received: from zeta.ixsystems.com (drawbridge.ixsystems.com [206.40.55.65]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id 6742B9DEB; Wed, 3 Jul 2013 12:24:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1372879456; bh=3JD/VofTCjjEyo1dzZYVJQXQiEJ4GXuZ2ey46hQqmiE=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=YO0h6UN9GmraKPzQ62Kf+1rTchWDUueLr79cq6hSoOgQE6U5HlZp9WFtVPaBYPqFa 5DFWzxunsNr6lKPcNhTLnigKTcX7Org0P1RHFLZXBjKmS57ElfpfkB3BivqWJj/TYN UlIry9moDI+WG9fwl+/Ilula6Xo7WMEk10xO6ibI= Message-ID: <51D47A5F.3030501@delphij.net> Date: Wed, 03 Jul 2013 12:24:15 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: Travis Mikalson Subject: Re: Report: ZFS deadlock in 9-STABLE References: <51D45401.5050801@terranova.net> In-Reply-To: <51D45401.5050801@terranova.net> X-Enigmail-Version: 1.5.1 Content-Type: multipart/mixed; boundary="------------030900050708080305020709" Cc: freebsd-fs@freebsd.org, kib@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jul 2013 19:24:17 -0000 This is a multi-part message in MIME format. --------------030900050708080305020709 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hi, Sorry for the top posting but I am quite convinced that this is a known issue that we have seen with our customer. Please try applying this patch [1] and please report back if that fixes your problem. Note that if you would like to provide more help, we would appreciate that you test Konstantin's patch as well, at: http://lists.freebsd.org/pipermail/freebsd-hackers/2013-May/042876.html [1] See attachment; the commit is https://github.com/trueos/trueos/commit/f678ae7c7f72fba577b00e3d0c237c4f297575c6 Cheers, On 07/03/13 09:40, Travis Mikalson wrote: > Hello, > > To cut to the chase, I have a procstat -kk -a captured during a > livelock for you here: > http://tog.net/freebsd/zfsdeadlock-storage1-20130703 > > The other relevant configurations I could think of to show you are > available within that http://tog.net/freebsd/ directory. > > If you want any additional information that I haven't given here > please let me know! > > This is a FreeBSD 9-STABLE AMD64 system currently at: r250777: Sat > May 18 17:41:39 EDT 2013 > > I didn't see too many relevant ZFS-related fixes after that date so > am waiting for another round of interesting commits to update > again. > > Unfortunately, this system has been livelocking on average about > once every 7-14 days. Its lot in life is a ZFS storage server > serving NFS and istgt traffic. > > It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. The zpool > looks like this, it has eight 1TB SAS drives and two SSDs being > used for log and cache. > > pool: storage1 state: ONLINE status: The pool is formatted using a > legacy on-disk format. The pool can still be used, but some > features are unavailable. action: Upgrade the pool using 'zpool > upgrade'. Once this is done, the pool will no longer be accessible > on software that does not support feature flags. scan: scrub > repaired 0 in 6h4m with 0 errors on Sun Jan 6 06:39:38 2013 > config: > > NAME STATE READ WRITE CKSUM storage1 ONLINE 0 > 0 0 raidz1-0 ONLINE 0 0 0 da0 ONLINE 0 > 0 0 da2 ONLINE 0 0 0 da4 ONLINE 0 > 0 0 da6 ONLINE 0 0 0 raidz1-1 ONLINE 0 > 0 0 da1 ONLINE 0 0 0 da3 ONLINE 0 > 0 0 da5 ONLINE 0 0 0 da7 ONLINE 0 > 0 0 logs mirror-2 ONLINE 0 0 0 da8p2 ONLINE > 0 0 0 da9p2 ONLINE 0 0 0 cache da8p3 > ONLINE 0 0 0 da9p3 ONLINE 0 0 0 > > errors: No known data errors > - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJR1HpeAAoJEG80Jeu8UPuzxQAH/iwsYlntqDdNt+nLl45KxzKV Zf0Nh1i0OMNJvSlMW/h1N89AChrCEjUQm+YNZ1+1QPR+kR/GiRsCHYeRzEYExfUH 98i0gGefr63/2vOML7+NgBc90Kf+cSdouMV+dOuhWNgD4t/aHbbJktIKR8Ye/T+8 20W89Ts34xr9D0IfcXhZB5JBlcBl9nrtD/vD7IZ2KVP8icjLh1TSKU8kEREka8EZ MGS0EfDF8KjfzekGCaSV/AQTDpUdltcRqxE7bG5IWTu0sRGmemqZjD5ilAPX0ls9 LctLiwp/k7xBJ8cUR9Zq9wBd6ISSb6Cc90Pf8Rm60438sDzUdwk9l5m9+BxPX+U= =ME+s -----END PGP SIGNATURE----- --------------030900050708080305020709 Content-Type: text/plain; charset=UTF-8; name="f678ae7c7f72fba577b00e3d0c237c4f297575c6.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="f678ae7c7f72fba577b00e3d0c237c4f297575c6.diff" diff --git a/sys/kern/kern_intr.c b/sys/kern/kern_intr.c index 33db213..75e0912 100644 --- a/sys/kern/kern_intr.c +++ b/sys/kern/kern_intr.c @@ -841,7 +841,7 @@ static void priv_ithread_execute_handler(struct proc *p, * again and remove this handler if it has already passed * it on the list. */ - ie->ie_thread->it_need = 1; + atomic_store_rel_int(&ie->ie_thread->it_need, 1); } else TAILQ_REMOVE(&ie->ie_handlers, handler, ih_next); thread_unlock(ie->ie_thread->it_thread); @@ -912,7 +912,7 @@ static void priv_ithread_execute_handler(struct proc *p, * running. Then, lock the thread and see if we actually need to * put it on the runqueue. */ - it->it_need = 1; + atomic_store_rel_int(&it->it_need, 1); thread_lock(td); if (TD_AWAITING_INTR(td)) { CTR3(KTR_INTR, "%s: schedule pid %d (%s)", __func__, p->p_pid, @@ -990,7 +990,7 @@ static void priv_ithread_execute_handler(struct proc *p, * again and remove this handler if it has already passed * it on the list. */ - it->it_need = 1; + atomic_store_rel_int(&it->it_need, 1); } else TAILQ_REMOVE(&ie->ie_handlers, handler, ih_next); thread_unlock(it->it_thread); @@ -1066,7 +1066,7 @@ static void priv_ithread_execute_handler(struct proc *p, * running. Then, lock the thread and see if we actually need to * put it on the runqueue. */ - it->it_need = 1; + atomic_store_rel_int(&it->it_need, 1); thread_lock(td); if (TD_AWAITING_INTR(td)) { CTR3(KTR_INTR, "%s: schedule pid %d (%s)", __func__, p->p_pid, @@ -1256,7 +1256,7 @@ static void priv_ithread_execute_handler(struct proc *p, * interrupt threads always invoke all of their handlers. */ if (ie->ie_flags & IE_SOFT) { - if (!ih->ih_need) + if (atomic_load_acq_int(&ih->ih_need) == 0) continue; else atomic_store_rel_int(&ih->ih_need, 0); @@ -1358,7 +1358,7 @@ static void priv_ithread_execute_handler(struct proc *p, * we are running, it will set it_need to note that we * should make another pass. */ - while (ithd->it_need) { + while (atomic_load_acq_int(&ithd->it_need) != 0) { /* * This might need a full read and write barrier * to make sure that this write posts before any @@ -1377,7 +1377,8 @@ static void priv_ithread_execute_handler(struct proc *p, * set again, so we have to check it again. */ thread_lock(td); - if (!ithd->it_need && !(ithd->it_flags & (IT_DEAD | IT_WAIT))) { + if ((atomic_load_acq_int(&ithd->it_need) == 0) && + !(ithd->it_flags & (IT_DEAD | IT_WAIT))) { TD_SET_IWAIT(td); ie->ie_count = 0; mi_switch(SW_VOL | SWT_IWAIT, NULL); @@ -1538,7 +1539,7 @@ static void priv_ithread_execute_handler(struct proc *p, * we are running, it will set it_need to note that we * should make another pass. */ - while (ithd->it_need) { + while (atomic_load_acq_int(&ithd->it_need) != 0) { /* * This might need a full read and write barrier * to make sure that this write posts before any @@ -1560,7 +1561,8 @@ static void priv_ithread_execute_handler(struct proc *p, * set again, so we have to check it again. */ thread_lock(td); - if (!ithd->it_need && !(ithd->it_flags & (IT_DEAD | IT_WAIT))) { + if ((atomic_load_acq_int(&ithd->it_need) == 0) && + !(ithd->it_flags & (IT_DEAD | IT_WAIT))) { TD_SET_IWAIT(td); ie->ie_count = 0; mi_switch(SW_VOL | SWT_IWAIT, NULL); --------------030900050708080305020709--