From owner-freebsd-stable@FreeBSD.ORG Thu Nov 21 20:37:17 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 264213E6 for ; Thu, 21 Nov 2013 20:37:17 +0000 (UTC) Received: from mail-ea0-x22e.google.com (mail-ea0-x22e.google.com [IPv6:2a00:1450:4013:c01::22e]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id B5B6523EE for ; Thu, 21 Nov 2013 20:37:16 +0000 (UTC) Received: by mail-ea0-f174.google.com with SMTP id b10so113664eae.19 for ; Thu, 21 Nov 2013 12:37:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=n+HgT/Lt2iQh9/ilSmGICtM3PJw4w/KH++lGt7fw4vc=; b=mqhN7C/SOqXLzBBx5k9UyovuG0wWqLNWpo8k+HPyxCgKWrIkSmDVb5E5hDBIqEhU6n Dcfvi9yqqMQ9i7nZOX5UZKtAKVOfScpQRrzcz+0YjmP7nzOY++5DfCD9vEkt8w2oNGvo eLCNBKUfZKtwoPvMpAWe9aGyRU8TCJfXgK9iyv+CZCcOgTkdEKBUjb0mmb0YYjvviUQC 9RTbCkgKBf7N0SYiL0bqn72Wd/BLiC6GwDmyLoliPqe2LX4akso+dz2/U3OgYBhCDdPy tW37II5YFeifDQBHAe2MJkbpZZfbI+0zclnQ5RaHfyn1h3e6FsAiywmUNlXI3mj086PG 5RoQ== X-Received: by 10.15.95.72 with SMTP id bc48mr12195eeb.49.1385066235142; Thu, 21 Nov 2013 12:37:15 -0800 (PST) Received: from localhost ([178.150.115.244]) by mx.google.com with ESMTPSA id u46sm73604781eep.17.2013.11.21.12.37.13 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Nov 2013 12:37:14 -0800 (PST) Sender: Mikolaj Golub Date: Thu, 21 Nov 2013 22:37:12 +0200 From: Mikolaj Golub To: Pete French Subject: Re: Hast locking up under 9.2 Message-ID: <20131121203711.GA3736@gmail.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.22 (2013-10-16) Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 21 Nov 2013 20:37:17 -0000 On Thu, Nov 21, 2013 at 11:57:02AM +0000, Pete French wrote: > I have had to (hopefully temprarily) disable hats on > our systems as under 9.2 I am finding that it locks up under > high disc load. This has only sarted being a problem after we moved > from 8-STABLE to 9-STABLE, there was no locking up before. I remember already asking you about replication mode you was using and don't remember you answered. One of the significant changes is memsync mode, which is default in 9.2 (it was fullsync in eralier versions). So if you are using default settings you can try switching to fullsync as a workaround. > I dont have any useful debugging unfortunately, and I do > realise thart "it locks up" is unhelpful! The only thing > I see in the syslog are a statements like this: > > Nov 14 13:51:59 serpentine-active hastd[1258]: [serp1] (primary) Worker process killed (pid=1520, signal=6). > Nov 14 13:51:59 serpentine-passive hastd[14307]: [serp1] (secondary) Worker process exited ungracefully (pid=14638, exitcode=75). signal=6 means that hastd crashed due to some assertion failed. Usually "Assertion failed ..." message precedes this line in the logs. Don't you see such a message? It might be very helpful. Do you always see this error when it gets stuck? Unfortunately the crash did not generated core (due to capsicum). When I want to get a coredump I rebuild hastd with CFLAGS+=-DHAVE_CAPSICUM removed in Makefile (and with debugging symbols). There might be an easier method but I don't know. If you don't find the assertion message and the crashes are reproducible, it would be helpful to rebuild hastd with symbols and capsicum disabled to make it coredump and provide the backtrace. Also, when you have hastd got stuck you can generate a core of the live process with gcore(1). > Thats about all the nfo I have - currently I have taken hast out of the stack > and am tryying to cobble something together manually using > iscsi, but I would prefer to go back to hast if possible. Has anyone seen > anythign similar, or have any suggestions ? What revision are you using? Recently there was a fix for crashes triggered by this failed assertion: Assertion failed: (amp->am_memtab[ext] > 0), function activemap_write_complete, file activemap.c, line 351. It was merged to STABLE/9 in r257470 (2013-10-31). -- Mikolaj Golub