From owner-freebsd-stable@FreeBSD.ORG  Thu Nov 21 11:57:06 2013
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 742CDB1A
 for <freebsd-stable@freebsd.org>; Thu, 21 Nov 2013 11:57:06 +0000 (UTC)
Received: from constantine.ingresso.co.uk (constantine.ingresso.co.uk
 [IPv6:2a02:b90:3002:e550::3])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 3405B217B
 for <freebsd-stable@freebsd.org>; Thu, 21 Nov 2013 11:57:06 +0000 (UTC)
Received: from dilbert.london-internal.ingresso.co.uk ([10.64.50.6]
 helo=dilbert.ingresso.co.uk)
 by constantine.ingresso.co.uk with esmtps (TLSv1:DHE-RSA-AES256-SHA:256)
 (Exim 4.80.1 (FreeBSD)) (envelope-from <petefrench@ingresso.co.uk>)
 id 1VjSsY-0009xT-IW
 for freebsd-stable@freebsd.org; Thu, 21 Nov 2013 11:57:02 +0000
Received: from petefrench by dilbert.ingresso.co.uk with local (Exim 4.80.1
 (FreeBSD)) (envelope-from <petefrench@ingresso.co.uk>)
 id 1VjSsY-000PXy-GC
 for freebsd-stable@freebsd.org; Thu, 21 Nov 2013 11:57:02 +0000
To: freebsd-stable@freebsd.org
Subject: Hast locking up under 9.2
Message-Id: <E1VjSsY-000PXy-GC@dilbert.ingresso.co.uk>
From: Pete French <petefrench@ingresso.co.uk>
Date: Thu, 21 Nov 2013 11:57:02 +0000
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2013 11:57:06 -0000

I have had to (hopefully temprarily) disable hats on
our systems as under 9.2 I am finding that it locks up under
high disc load. This has only sarted being a problem after we moved
from 8-STABLE to 9-STABLE, there was no locking up before.

I have a zpool on top of the hast devices - I did have two hast devices,
but the problem still occurs with a single device. the symptoms
are that I see the 'dirty" count on the master sidetick at 2.0 megs
and not change, the number of writes does not change, and if I usse a "sync"
command at the command line it never returns - there is no disc activity
on eiher the primary or the secondary side. If I leave it like this it will
eventually freeze the whole machine, but usually if I see this happening I 
reboot the stuck machine.

This only happens under high levels of disc activity (in this case modifying
a mysql table from myisan to inndb - causes a few gig of copies). However it
is not simply high disc activity as I can resilver the ZFS pool quite happily
without problems.

Frustratingly I have a similar setup on a test pair of machines, but I cannot
reporduce the problem there.

I dont have any useful debugging unfortunately, and I do
realise thart "it locks up" is unhelpful! The only thing
I see in the syslog are a statements like this:

Nov 14 13:51:59 <daemon.err> serpentine-active hastd[1258]: [serp1] (primary) Worker process killed (pid=1520, signal=6).
Nov 14 13:51:59 <daemon.err> serpentine-passive hastd[14307]: [serp1] (secondary) Worker process exited ungracefully (pid=14638, exitcode=75).

Thats about all the nfo I have - currently I have taken hast out of the stack
and am tryying to cobble something together manually using
iscsi, but I would prefer to go back to hast if possible. Has anyone seen
anythign similar, or have any suggestions ?

thanks,

-pete.