Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Nov 2013 11:18:29 +0000
From:      Pete French <petefrench@ingresso.co.uk>
To:        petefrench@ingresso.co.uk, trociny@FreeBSD.org
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Hast locking up under 9.2
Message-ID:  <E1Vjokn-000OuU-1Y@dilbert.ingresso.co.uk>
In-Reply-To: <20131121203711.GA3736@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
> I remember already asking you about replication mode you was using and
> don't remember you answered. One of the significant changes is memsync
> mode, which is default in 9.2 (it was fullsync in eralier versions).
> So if you are using default settings you can try switching to fullsync
> as a workaround.

Yes, I am using the default settings, so that is something I
can try. After three days of downtime last week I will not try it
in the immedaiet future though, for fear of my colleaguyes wanting
to strange me :-) Will enable on the test system however, and try on live
in a couple of weeks if I can.

> signal=6 means that hastd crashed due to some assertion failed.
> Usually "Assertion failed ..." message precedes this line in the
> logs. Don't you see such a message? It might be very helpful.

Yes, I do actually!

"Assertion failed: (!hio->hio_done), function write_complete, file /usr/src/sbin/hastd/primary.c, line 1130."

> Do you always see this error when it gets stuck?

That I do not know I am afraid - I was too busy getting the systems back online
to have time to try and recocnile the tdowntimes with what is in the logfiles.
It was only yesterday that I started trying to tarce what might have
happened

> Unfortunately the crash did not generated core (due to capsicum). When
> I want to get a coredump I rebuild hastd with CFLAGS+=-DHAVE_CAPSICUM
> removed in Makefile (and with debugging symbols). There might be an
> easier method but I don't know.
>
> If you don't find the assertion message and the crashes are
> reproducible, it would be helpful to rebuild hastd with symbols and
> capsicum disabled to make it coredump and provide the backtrace.
>
> Also, when you have hastd got stuck you can generate a core of the
> live process with gcore(1).

I didnt know about gcore - thats a very useful feature! The crash
is reproducible, but not on any machine that I could actually
crash without causing extensive downtime to the rest of the business
unfortunately. I can't deliberately crash our master database and
it doesnt crash ont he test setup we have. But what I can do is to run it up
live again with your suggested change to the config, and if it gets stuck
try and generate some more useful debugging then.

> What revision are you using? Recently there was a fix for crashes
> triggered by this failed assertion:
>
>  Assertion failed: (amp->am_memtab[ext] > 0), function
>  activemap_write_complete, file activemap.c, line 351.

I'm using r257795 - I did an upgrade to get the fix for the above assertion,
and in general I keep an eve onm the commits and anything involving hast
or zfs I take as soon as I can to try and improve stability.

Thanks for the help - if I get any more info I will let
you know, of if the above assertyion helps you track something down
then I may be able to try some patches.

cheers,

-pete.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1Vjokn-000OuU-1Y>