Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Apr 2013 12:52:01 +0100
From:      Tom Evans <tevans.uk@googlemail.com>
To:        Quartz <quartz@sneakertech.com>
Cc:        FreeBSD FS <freebsd-fs@freebsd.org>
Subject:   Re: ZFS: Failed pool causes system to hang
Message-ID:  <CAFHbX1LO9OvbqyYYaob-7nQSA_dwQkMK7%2Bvn9c4QrXQuKvTCFA@mail.gmail.com>
In-Reply-To: <5163F03B.9060700@sneakertech.com>
References:  <2092374421.4491514.1365459764269.JavaMail.root@k-state.edu> <5163F03B.9060700@sneakertech.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 9, 2013 at 11:40 AM, Quartz <quartz@sneakertech.com> wrote:
>
>> So, you're not really waiting a long time....
>
>
> I still don't think you're 100% clear on what's happening in my case. I'm
> trying to explain that my problem is *prior* to the motherboard resetting,
> NOT after. If I hard-reset the machine with the front panel switch, it boots
> just fine every time.
>
> When my pool *FAILS* (ie; is unrecoverable because I lost too many drives)
> it hangs effectively all io on the entire machine. I can't cd or ls
> directories, I can't run any zfs commands, and I can't issue a reboot or
> halt. This is a hang. The machine is completely useless in this state. There
> is no disk or cpu activity churning. There's no pool (anymore) to be trying
> to resilver or whatever anyway.
>
> I'm not going to wait 3+ hours for "shutdown -r now" to bring the machine
> down. Especially not when I already know that zfs won't let it.
>

I think what Lawrence is trying to explain is that a "hang" is not
necessarily a deadlock. Leaving the system for an extended period may
bring it back. What you are saying is also valid, that a hang that
long is equivalent to a deadlock in your usage. Computers, even
essential dedicated servers sometimes hang, which is why it is common
to have some way of remotely power cycling. If your server is
important, you need some sort of RAC for these scenarios.

So, how to find out where the hang is. Your ZFS pools and your root
disk probably - I've not seen a dmesg - share one thing in common,
ATA/AHCI. If root does not also use this, does losing the pool still
cause problems with root? Perhaps breaking into ddb at this point
could tell us something.

Cheers

Tom



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFHbX1LO9OvbqyYYaob-7nQSA_dwQkMK7%2Bvn9c4QrXQuKvTCFA>