From owner-freebsd-fs@FreeBSD.ORG Sun Oct 5 14:51:27 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 30C3B646 for ; Sun, 5 Oct 2014 14:51:27 +0000 (UTC) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 86248F26 for ; Sun, 5 Oct 2014 14:51:25 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.5/8.14.5) with ESMTP id s95EovMK075584; Sun, 5 Oct 2014 18:50:57 +0400 (MSK) (envelope-from marck@rinet.ru) Date: Sun, 5 Oct 2014 18:50:57 +0400 (MSK) From: Dmitry Morozovsky To: Mikolaj Golub Subject: Re: HAST with broken HDD In-Reply-To: <20141003175439.GA7664@gmail.com> Message-ID: References: <542BC135.1070906@Skynet.be> <542BDDB3.8080805@internetx.com> <542BF853.3040604@internetx.com> <542C019E.2080702@internetx.com> <542C0710.3020402@internetx.com> <97aab72e19d640ebb65c754c858043cc@SERVER.ad.usd-group.com> <20141003175439.GA7664@gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (woozle.rinet.ru [0.0.0.0]); Sun, 05 Oct 2014 18:50:57 +0400 (MSK) Cc: "freebsd-fs@freebsd.org" , Matt Churchyard X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Oct 2014 14:51:27 -0000 On Fri, 3 Oct 2014, Mikolaj Golub wrote: > Disk errors are recorded to syslog. Also error counters are displayed > in `hastctl list' output. There is snmp_hast(3) in base -- a module > for bsnmp to retrieve this statistics via snmp protocol (traps are not > supported though). > > For notifications, the hastd can be configured to execute an arbitrary > command on various HAST events (see description for `exec' in > hast.conf(5)). Unfortunately, it does not have hooks for I/O error > events currently. It might be worth adding though. The problem with > this that it may generate to many events, so some throttling is > needed. And, I it, this should be noted, some kind of error-coalescing or similar before going from "warning" shate (there are some read error, but otherwise the disk is useable, and it would be overly hassle to switch to remote component completely) to "error" state (component is unuseable and needs to be replaced ASAP; drop it from HAST pair, and switchover if needed). Error such as "device lost" is, of course, fatal from the very beginning; but -- how should we interpret, well, sporadic controller resets with the disk coming back and catching syncing again? -- Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------