Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Jul 2011 13:41:34 -0700
From:      David P Discher <dpd@bitgravity.com>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        freebsd-fs@FreeBSD.org, Andriy Gapon <avg@freebsd.org>
Subject:   Re: zfs process hang on pool access
Message-ID:  <6703F0BB-D4FC-4417-B519-CAFC62E5BC39@bitgravity.com>
In-Reply-To: <4E302204.2030009@FreeBSD.org>
References:  <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk> <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> <4E301C55.7090105@FreeBSD.org> <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk> <4E301F10.6060708@FreeBSD.org> <63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk> <4E302204.2030009@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
The way I found this was breaking into the debugger, do some back =
traces, continue, break in again, do some more back traces on the hung =
processes ... see what is going on, then walk through the code.=20

Then what I had specific loops and code locations, asking the higher =
powers of the freebsd kernel world.

Of course, I had the high cpu and was peaking at the arc_reclaim_thread.=20=


I've seen this nearly like clockwork in production at 106-107 days. If =
it goes on too much longer than that, then things deadlock.=20

But 112 days, and 8.2 ... you for sure have the LBOLT overflow.=20

Otherwise, reboot and patch.  However, I have not fully vetted the patch =
under heavily load, and currently seeing another deadlock issue with =
8.1+ zfs v14 - but seemly durning writes after 6-40 hours.  Still =
investigating.=20

Note, my proposal of "time_uptime" doesn't work - as it causes a =
buildworld error in zfs userland tools.

This is what I'm currently running to fix the 26 day issue with l2arc =
feeder and arc_reclaim_thread with LBOLT in 8.1.=20


Index: sys/cddl/compat/opensolaris/sys/time.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/cddl/compat/opensolaris/sys/time.h      (.../8.1-BGOS-20110105) =
(revision 3322)
+++ sys/cddl/compat/opensolaris/sys/time.h      (.../8.1-BGOS-20110613) =
(working copy)
@@ -38,7 +38,7 @@
=20
 typedef longlong_t     hrtime_t;
=20
-#define        LBOLT   ((gethrtime() * hz) / NANOSEC)
+#define        LBOLT   (gethrtime() * (NANOSEC/hz))
=20
 #if defined(__i386__) || defined(__powerpc__)
 #define        TIMESPEC_OVERFLOW(ts)                                    =
       \

Index: sys/cddl/compat/opensolaris/sys/types.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- sys/cddl/compat/opensolaris/sys/types.h     (.../8.1-BGOS-20110105) =
(revision 3322)
+++ sys/cddl/compat/opensolaris/sys/types.h     (.../8.1-BGOS-20110613) =
(working copy)
@@ -34,6 +34,12 @@
  */
=20
 #include <sys/stdint.h>
+
+#ifdef _KERNEL
+typedef        int64_t         clock_t;
+#define        _CLOCK_T_DECLARED
+#endif
+
 #include_next <sys/types.h>
=20
 #define        MAXNAMELEN      256


---
David P. Discher
dpd@bitgravity.com * AIM: bgDavidDPD
BITGRAVITY * http://www.bitgravity.com

On Jul 27, 2011, at 7:34 AM, Andriy Gapon wrote:

>> Ahh, is there anyway to confirm that before I reboot, or any other
>> information we could glean that might be useful?
>=20
> No quick ideas, unfortunately.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6703F0BB-D4FC-4417-B519-CAFC62E5BC39>