Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 03 Jul 2013 12:24:15 -0700
From:      Xin Li <delphij@delphij.net>
To:        Travis Mikalson <bofh@terranova.net>
Cc:        freebsd-fs@freebsd.org, kib@freebsd.org
Subject:   Re: Report: ZFS deadlock in 9-STABLE
Message-ID:  <51D47A5F.3030501@delphij.net>
In-Reply-To: <51D45401.5050801@terranova.net>
References:  <51D45401.5050801@terranova.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------030900050708080305020709
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi,

Sorry for the top posting but I am quite convinced that this is a
known issue that we have seen with our customer.  Please try applying
this patch [1] and please report back if that fixes your problem.

Note that if you would like to provide more help, we would appreciate
that you test Konstantin's patch as well, at:

http://lists.freebsd.org/pipermail/freebsd-hackers/2013-May/042876.html

[1] See attachment; the commit is
https://github.com/trueos/trueos/commit/f678ae7c7f72fba577b00e3d0c237c4f297575c6

Cheers,

On 07/03/13 09:40, Travis Mikalson wrote:
> Hello,
> 
> To cut to the chase, I have a procstat -kk -a captured during a
> livelock for you here: 
> http://tog.net/freebsd/zfsdeadlock-storage1-20130703
> 
> The other relevant configurations I could think of to show you are 
> available within that http://tog.net/freebsd/ directory.
> 
> If you want any additional information that I haven't given here
> please let me know!
> 
> This is a FreeBSD 9-STABLE AMD64 system currently at: r250777: Sat
> May 18 17:41:39 EDT 2013
> 
> I didn't see too many relevant ZFS-related fixes after that date so
> am waiting for another round of interesting commits to update
> again.
> 
> Unfortunately, this system has been livelocking on average about
> once every 7-14 days. Its lot in life is a ZFS storage server
> serving NFS and istgt traffic.
> 
> It has 32GB of RAM and is an 8-core 2.6GHz Opteron 6212. The zpool
> looks like this, it has eight 1TB SAS drives and two SSDs being
> used for log and cache.
> 
> pool: storage1 state: ONLINE status: The pool is formatted using a
> legacy on-disk format.  The pool can still be used, but some
> features are unavailable. action: Upgrade the pool using 'zpool
> upgrade'.  Once this is done, the pool will no longer be accessible
> on software that does not support feature flags. scan: scrub
> repaired 0 in 6h4m with 0 errors on Sun Jan  6 06:39:38 2013 
> config:
> 
> NAME        STATE     READ WRITE CKSUM storage1    ONLINE       0
> 0     0 raidz1-0  ONLINE       0     0     0 da0     ONLINE       0
> 0     0 da2     ONLINE       0     0     0 da4     ONLINE       0
> 0     0 da6     ONLINE       0     0     0 raidz1-1  ONLINE       0
> 0     0 da1     ONLINE       0     0     0 da3     ONLINE       0
> 0     0 da5     ONLINE       0     0     0 da7     ONLINE       0
> 0     0 logs mirror-2  ONLINE       0     0     0 da8p2   ONLINE
> 0     0     0 da9p2   ONLINE       0     0     0 cache da8p3
> ONLINE       0     0     0 da9p3     ONLINE       0     0     0
> 
> errors: No known data errors
> 


- -- 
Xin LI <delphij@delphij.net>    https://www.delphij.net/
FreeBSD - The Power to Serve!           Live free or die
-----BEGIN PGP SIGNATURE-----

iQEcBAEBCgAGBQJR1HpeAAoJEG80Jeu8UPuzxQAH/iwsYlntqDdNt+nLl45KxzKV
Zf0Nh1i0OMNJvSlMW/h1N89AChrCEjUQm+YNZ1+1QPR+kR/GiRsCHYeRzEYExfUH
98i0gGefr63/2vOML7+NgBc90Kf+cSdouMV+dOuhWNgD4t/aHbbJktIKR8Ye/T+8
20W89Ts34xr9D0IfcXhZB5JBlcBl9nrtD/vD7IZ2KVP8icjLh1TSKU8kEREka8EZ
MGS0EfDF8KjfzekGCaSV/AQTDpUdltcRqxE7bG5IWTu0sRGmemqZjD5ilAPX0ls9
LctLiwp/k7xBJ8cUR9Zq9wBd6ISSb6Cc90Pf8Rm60438sDzUdwk9l5m9+BxPX+U=
=ME+s
-----END PGP SIGNATURE-----

--------------030900050708080305020709
Content-Type: text/plain; charset=UTF-8;
 name="f678ae7c7f72fba577b00e3d0c237c4f297575c6.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="f678ae7c7f72fba577b00e3d0c237c4f297575c6.diff"

diff --git a/sys/kern/kern_intr.c b/sys/kern/kern_intr.c
index 33db213..75e0912 100644
--- a/sys/kern/kern_intr.c
+++ b/sys/kern/kern_intr.c
@@ -841,7 +841,7 @@ static void	priv_ithread_execute_handler(struct proc *p,
 		 * again and remove this handler if it has already passed
 		 * it on the list.
 		 */
-		ie->ie_thread->it_need = 1;
+		atomic_store_rel_int(&ie->ie_thread->it_need, 1);
 	} else
 		TAILQ_REMOVE(&ie->ie_handlers, handler, ih_next);
 	thread_unlock(ie->ie_thread->it_thread);
@@ -912,7 +912,7 @@ static void	priv_ithread_execute_handler(struct proc *p,
 	 * running.  Then, lock the thread and see if we actually need to
 	 * put it on the runqueue.
 	 */
-	it->it_need = 1;
+	atomic_store_rel_int(&it->it_need, 1);
 	thread_lock(td);
 	if (TD_AWAITING_INTR(td)) {
 		CTR3(KTR_INTR, "%s: schedule pid %d (%s)", __func__, p->p_pid,
@@ -990,7 +990,7 @@ static void	priv_ithread_execute_handler(struct proc *p,
 		 * again and remove this handler if it has already passed
 		 * it on the list.
 		 */
-		it->it_need = 1;
+		atomic_store_rel_int(&it->it_need, 1);
 	} else
 		TAILQ_REMOVE(&ie->ie_handlers, handler, ih_next);
 	thread_unlock(it->it_thread);
@@ -1066,7 +1066,7 @@ static void	priv_ithread_execute_handler(struct proc *p,
 	 * running.  Then, lock the thread and see if we actually need to
 	 * put it on the runqueue.
 	 */
-	it->it_need = 1;
+	atomic_store_rel_int(&it->it_need, 1);
 	thread_lock(td);
 	if (TD_AWAITING_INTR(td)) {
 		CTR3(KTR_INTR, "%s: schedule pid %d (%s)", __func__, p->p_pid,
@@ -1256,7 +1256,7 @@ static void	priv_ithread_execute_handler(struct proc *p,
 		 * interrupt threads always invoke all of their handlers.
 		 */
 		if (ie->ie_flags & IE_SOFT) {
-			if (!ih->ih_need)
+			if (atomic_load_acq_int(&ih->ih_need) == 0)
 				continue;
 			else
 				atomic_store_rel_int(&ih->ih_need, 0);
@@ -1358,7 +1358,7 @@ static void	priv_ithread_execute_handler(struct proc *p,
 		 * we are running, it will set it_need to note that we
 		 * should make another pass.
 		 */
-		while (ithd->it_need) {
+		while (atomic_load_acq_int(&ithd->it_need) != 0) {
 			/*
 			 * This might need a full read and write barrier
 			 * to make sure that this write posts before any
@@ -1377,7 +1377,8 @@ static void	priv_ithread_execute_handler(struct proc *p,
 		 * set again, so we have to check it again.
 		 */
 		thread_lock(td);
-		if (!ithd->it_need && !(ithd->it_flags & (IT_DEAD | IT_WAIT))) {
+		if ((atomic_load_acq_int(&ithd->it_need) == 0) &&
+		    !(ithd->it_flags & (IT_DEAD | IT_WAIT))) {
 			TD_SET_IWAIT(td);
 			ie->ie_count = 0;
 			mi_switch(SW_VOL | SWT_IWAIT, NULL);
@@ -1538,7 +1539,7 @@ static void	priv_ithread_execute_handler(struct proc *p,
 		 * we are running, it will set it_need to note that we
 		 * should make another pass.
 		 */
-		while (ithd->it_need) {
+		while (atomic_load_acq_int(&ithd->it_need) != 0) {
 			/*
 			 * This might need a full read and write barrier
 			 * to make sure that this write posts before any
@@ -1560,7 +1561,8 @@ static void	priv_ithread_execute_handler(struct proc *p,
 		 * set again, so we have to check it again.
 		 */
 		thread_lock(td);
-		if (!ithd->it_need && !(ithd->it_flags & (IT_DEAD | IT_WAIT))) {
+		if ((atomic_load_acq_int(&ithd->it_need) == 0) &&
+		    !(ithd->it_flags & (IT_DEAD | IT_WAIT))) {
 			TD_SET_IWAIT(td);
 			ie->ie_count = 0;
 			mi_switch(SW_VOL | SWT_IWAIT, NULL);

--------------030900050708080305020709--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51D47A5F.3030501>