Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 21 Nov 2013 22:37:12 +0200
From:      Mikolaj Golub <trociny@FreeBSD.org>
To:        Pete French <petefrench@ingresso.co.uk>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Hast locking up under 9.2
Message-ID:  <20131121203711.GA3736@gmail.com>
In-Reply-To: <E1VjSsY-000PXy-GC@dilbert.ingresso.co.uk>
References:  <E1VjSsY-000PXy-GC@dilbert.ingresso.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Nov 21, 2013 at 11:57:02AM +0000, Pete French wrote:
> I have had to (hopefully temprarily) disable hats on
> our systems as under 9.2 I am finding that it locks up under
> high disc load. This has only sarted being a problem after we moved
> from 8-STABLE to 9-STABLE, there was no locking up before.

I remember already asking you about replication mode you was using and
don't remember you answered. One of the significant changes is memsync
mode, which is default in 9.2 (it was fullsync in eralier versions).
So if you are using default settings you can try switching to fullsync
as a workaround.

> I dont have any useful debugging unfortunately, and I do
> realise thart "it locks up" is unhelpful! The only thing
> I see in the syslog are a statements like this:
> 
> Nov 14 13:51:59 <daemon.err> serpentine-active hastd[1258]: [serp1] (primary) Worker process killed (pid=1520, signal=6).
> Nov 14 13:51:59 <daemon.err> serpentine-passive hastd[14307]: [serp1] (secondary) Worker process exited ungracefully (pid=14638, exitcode=75).

signal=6 means that hastd crashed due to some assertion failed.
Usually "Assertion failed ..." message precedes this line in the
logs. Don't you see such a message? It might be very helpful.

Do you always see this error when it gets stuck?

Unfortunately the crash did not generated core (due to capsicum). When
I want to get a coredump I rebuild hastd with CFLAGS+=-DHAVE_CAPSICUM
removed in Makefile (and with debugging symbols). There might be an
easier method but I don't know.

If you don't find the assertion message and the crashes are
reproducible, it would be helpful to rebuild hastd with symbols and
capsicum disabled to make it coredump and provide the backtrace.

Also, when you have hastd got stuck you can generate a core of the
live process with gcore(1).

> Thats about all the nfo I have - currently I have taken hast out of the stack
> and am tryying to cobble something together manually using
> iscsi, but I would prefer to go back to hast if possible. Has anyone seen
> anythign similar, or have any suggestions ?

What revision are you using? Recently there was a fix for crashes
triggered by this failed assertion:

 Assertion failed: (amp->am_memtab[ext] > 0), function
 activemap_write_complete, file activemap.c, line 351.

It was merged to STABLE/9 in r257470 (2013-10-31).

-- 
Mikolaj Golub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131121203711.GA3736>