Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Oct 2014 00:33:53 +0300
From:      George Kontostanos <gkontos.mail@gmail.com>
To:        Matt Churchyard <matt.churchyard@userve.net>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: HAST with broken HDD
Message-ID:  <CA%2BdUSypF4171Rp8nVv1T=2ZiZLCmdXxFiEz-qLmOg0MqDPL_CQ@mail.gmail.com>
In-Reply-To: <97aab72e19d640ebb65c754c858043cc@SERVER.ad.usd-group.com>
References:  <542BC135.1070906@Skynet.be> <542BDDB3.8080805@internetx.com> <CA%2BdUSypO8xTR3sh_KSL9c9FLxbGH%2BbTR9-gPdcCVd%2Bt0UgUF-g@mail.gmail.com> <542BF853.3040604@internetx.com> <CA%2BdUSyp4vMB_qUeqHgXNz2FiQbWzh8MjOEFYw%2BURcN4gUq69nw@mail.gmail.com> <542C019E.2080702@internetx.com> <CA%2BdUSyoEcPdJ1hdR3k1vNROFG7p1kN0HB5S2a_0gYhiV75OLAw@mail.gmail.com> <542C0710.3020402@internetx.com> <CA%2BdUSyr9OK9SvN3wX-O4DeriLBP-EEuAA8TTSYwdGfcR1asdtQ@mail.gmail.com> <97aab72e19d640ebb65c754c858043cc@SERVER.ad.usd-group.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Oct 1, 2014 at 6:51 PM, Matt Churchyard <matt.churchyard@userve.net=
>
wrote:

> On Wed, Oct 1, 2014 at 4:52 PM, InterNetX - Juergen Gotteswinter <
> jg@internetx.com> wrote:
>
> > Am 01.10.2014 um 15:49 schrieb George Kontostanos:
> > > On Wed, Oct 1, 2014 at 4:29 PM, InterNetX - Juergen Gotteswinter
> > > <jg@internetx.com <mailto:jg@internetx.com>> wrote:
> > >
> > >     Am 01.10.2014 um 15:06 schrieb George Kontostanos:
> > >     >
> > >     >
> > >     > On Wed, Oct 1, 2014 at 3:49 PM, InterNetX - Juergen Gotteswinte=
r
> > >     > <jg@internetx.com <mailto:jg@internetx.com> <mailto:
> > jg@internetx.com
> > >     <mailto:jg@internetx.com>>> wrote:
> > >     >
> > >     >     Am 01.10.2014 um 14:28 schrieb George Kontostanos:
> > >     >     >
> > >     >     > On Wed, Oct 1, 2014 at 1:55 PM, InterNetX - Juergen
> > Gotteswinter
> > >     >     > <jg@internetx.com <mailto:jg@internetx.com> <mailto:
> > jg@internetx.com
> > >     <mailto:jg@internetx.com>>
> > >     >     <mailto:jg@internetx.com <mailto:jg@internetx.com> <mailto:
> > jg@internetx.com
> > >     <mailto:jg@internetx.com>>>> wrote:
> > >     >     >
> > >     >     >     Am 01.10.2014 um 10:54 schrieb JF-Bogaerts:
> > >     >     >     >    Hello,
> > >     >     >     >    I'm preparing a HA NAS solution using HAST.
> > >     >     >     >    I'm wondering what will happen if one of disks o=
f
> > the
> > >     >     primary node will
> > >     >     >     >    fail or become erratic.
> > >     >     >     >
> > >     >     >     >    Thx,
> > >     >     >     >    Jean-Fran=C3=A7ois Bogaerts
> > >     >     >
> > >     >     >     nothing. if you are using zfs on top of hast zfs wont
> > even
> > >     >     take notice
> > >     >     >     about the disk failure.
> > >     >     >
> > >     >     >     as long as the write operation was sucessfull on one =
of
> > the 2
> > >     >     nodes,
> > >     >     >     hast doesnt notify the ontop layers about io errors.
> > >     >     >
> > >     >     >     interesting concept, took me some time to deal with
> this.
> > >     >     >
> > >     >     >
> > >     >     > Are you saying that the pool will appear to be optimal ev=
en
> > with a bad
> > >     >     > drive?
> > >     >     >
> > >     >     >
> > >     >
> > >     >     https://forums.freebsd.org/viewtopic.php?&t=3D24786
> > >     >
> > >     >
> > >     >
> > >     > It appears that this is actually the case. And it is very
> > disturbing,
> > >     > meaning that a drive failure goes unnoticed. In my case I
> > completely
> > >     > removed the second disk on the primary node and a zpool status
> > showed
> > >     > absolutely no problem. Scrubbing the pool began resilvering whi=
ch
> > >     > indicates that there is actually something wrong!
> > >
> > >
> > >     right. lets go further and think how zfs works regarding direct
> > hardware
> > >     / disk access. theres a layer between which always says ey,
> > everthing is
> > >     fine. no more need for pool scrubbing, since hastd wont tell if
> > anything
> > >     is wrong :D
> > >
> > >
> > > Correct, ZFS needs direct access and any layer in between might end
> > > up a disaster!!!
> > >
> > > Which means that practically HAST should only be used in UFS
> > > environments backed by a hardware controller. In that case, HAST
> > > will not notice again anything (unless you loose the controller) but
> > > at least you will know that you need to replace a disk, by
> > > monitoring the controller status.
> > >
> >
> > imho this should be included at least as a notice/warning in the hastd
> > manpage, afaik theres no real warning about such problems with the
> > hastd/zfs combo. but lots of howtos are out there describing exactly
> > such setups.
> >
> > Yes, it should. I have actually written a guide like that when HAST
> > was at
> its early stages. I had never tested it though for flaws. This thread
> started ringing some bells!
>
>
>
> > sad, since the comparable piece on linux - drbd - is handling io
> > errors fine. the upper layers get notified like it should be imho
> >
> > My next lab environment will be to try a DRBD similar set up. Although
> some tests we performed last year with ZFS on linux were not that
> promising.
>
> From what I can see HAST is working, at least in concept, as it should.
>
> If you install any filesystem on top of a RAID mirror, either disk can
> fail and the filesystem above should just continue on as if nothing
> happened. It's up to the RAID layer to notify you of the problem.
>
> HAST is basically "RAID1-over-network", so if a disk fails, it should jus=
t
> handle read/writes using the other disk, and the filesystem on top, be it
> UFS/ZFS/whatever, should just carry on as normal (which is what has been
> observed). Of course, HAST (or the OS) should notify you of the disk erro=
r
> though (probably through devd) so you can do something about it. Maybe it
> already exists, but HAST should be able to provide overall status
> information and raise events just like ZFS or any RAID subsystem would. Y=
ou
> also of course shouldn't get scrub errors and corruption like that seen i=
n
> the original post either just because one half of the HAST mirror has gon=
e.
>
> Personally I've not been brave enough to use HAST yet. It seems to me lik=
e
> there's too many possibilities for situations where things can go wrong.
> One of these that has been discussed on the forums is that a ZFS scrub wi=
ll
> only read data from the local disk. You could happily run a service from
> the master server for years, scrubbing regularly, never knowing that your
> data may be corrupt on the second HAST node. One 'solution' mentioned for
> this would be to regularly switch the master/slave nodes, running scrubs =
on
> each one while they are master.
>
> --
> Matt
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

I believe that HAST began as FreeBSD's answer to DRBD. Given the fact that
FreeBSD had (still has) the competitive advantage of ZFS then maybe it
should have been designed with that in mind. ZFS is not meant to work
within a middle layer. Instead it needs direct access to the disks. So,
only the fact that we are using a technology that creates another layer is
a good stopper.

There are of course other ways to achieve redundancy and avoid going over
the network. But in some cases, where you need 2 storages located in to
different DC's, HAST might have been a good choice.

Of course, like you mentioned before. In order for this to work we would
need for HAST to monitor the health of every resource. That could be from
devd or from another HAST daemon or a combination. The administrator should
be able to easily get warnings about faulty components. We would also need
to use a fence device instead of relying on VRRP.

Anyway, thats food for though :)

--=20
George Kontostanos
---



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BdUSypF4171Rp8nVv1T=2ZiZLCmdXxFiEz-qLmOg0MqDPL_CQ>