Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Feb 2014 10:51:42 -0800
From:      <dteske@FreeBSD.org>
To:        <dcamp@alumni.ufl.edu>, <freebsd-questions@freebsd.org>
Cc:        dteske@FreeBSD.org
Subject:   RE: System freezes up during long-running ZFS disk activity
Message-ID:  <10c801cf2e6c$cc6599f0$6530cdd0$@FreeBSD.org>
In-Reply-To: <CADbaceJ00rk8RFMwi-S-HLNBX673j2DGe6SngUcvYTFTd5KFxw@mail.gmail.com>
References:  <CADbaceJ00rk8RFMwi-S-HLNBX673j2DGe6SngUcvYTFTd5KFxw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help


> -----Original Message-----
> From: Christian Campbell [mailto:dcamp@alumni.ufl.edu]
> Sent: Wednesday, February 19, 2014 12:07 PM
> To: freebsd-questions@freebsd.org
> Subject: System freezes up during long-running ZFS disk activity
> 
> I recently installed 9.2-RELEASE-p3 on a Dell Precision T5400. I'm using
ZFS
> filesystem version: 5, ZFS storage pool version: features support (5000).
The
> pool was imported from a previous 9.2 box on which it worked without
issue.
> 
> I don't know if my problem is ZFS-related, but my ZFS use is why I noticed
it and
> I seem to be able to reproduce it reliably. Every so often, from minutes
to
> hours, my computer will freeze up while ZFS has been busy. This happens
> during a resilver, a scrub, and a long-running process reading millions of
files
> from the pool. When it freezes, all output and input
> freezes: tasks like zpool iostat -v 1 or top stop updating their output,
whether on
> the console or an ssh terminal over Ethernet. Pressing keys does not
garner a
> response.* Sometimes a freeze lasts minutes and then proceeds on its own.
> Sometimes it goes on for hours. An action that typically, but not always,
jogs it
> is unplugging the USB keyboard -- the disk activity resumes immediately,
and
> any queued keyboard input immediately plays out whether on the console or
> over ssh. Lastly, my ssh terminal (PuTTY) will stay connected for hours
during a
> freeze-up, *i.e.* the TCP circuit is not closed or timed out, as opposed
to
> closing pretty quickly after the server is powered off.
> 
> In all cases, the system clock lags by the sum of the durations of the
freezes.
> 
> * During an initial resilver, I noticed that pressing a key such as Ctrl
on the USB
> keyboard would jog it, but pressing Ctrl or other keys doesn't jog my
process of
> long-running IO activity. But in all cases, even when unplugging and
replugging
> the USB keyboard doesn't jog it, Ctrl-Alt-Del prompts an orderly shutdown.
> 
> Debugging advise is very welcome!
> 
[Devin Teske] 

I had this exact same problem on a Dell 1U F1DH server. I didn't send any
e-mail
to the mailing lists, because I feared I was going crazy.

Of course, it's been 30 days since I had that problem... if I try to
remember what
it was... it was either the bad SATA port (which had loose soldering), or it
was the
drive which said SATA port had fubar'd (putting that drive into another
system saw
the same thing happen in said new system).

So what I did was rsync all the data off that drive to another one (and yes,
because
I had to "jog" the system to get it to be responsive, in the same exact
situation you
describe above) it took a very _very_ long time. But... once I got off of
that drive
everything looked much much better.

I also found other ways to jog it were Alt+FN, and even the occasional ping
would
jog it too. It appeared to be interrupt driven in some way.

Might I suggest that you have a drive acting up in your pool.
-- 
Devin

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?10c801cf2e6c$cc6599f0$6530cdd0$>