From owner-freebsd-stable@FreeBSD.ORG  Sun Aug 21 05:00:56 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2E5ED106566B
	for <freebsd-stable@freebsd.org>; Sun, 21 Aug 2011 05:00:56 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta09.emeryville.ca.mail.comcast.net
	(qmta09.emeryville.ca.mail.comcast.net [76.96.30.96])
	by mx1.freebsd.org (Postfix) with ESMTP id 10B8E8FC12
	for <freebsd-stable@freebsd.org>; Sun, 21 Aug 2011 05:00:55 +0000 (UTC)
Received: from omta14.emeryville.ca.mail.comcast.net ([76.96.30.60])
	by qmta09.emeryville.ca.mail.comcast.net with comcast
	id Nt0C1h0021HpZEsA9t0r5N; Sun, 21 Aug 2011 05:00:51 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta14.emeryville.ca.mail.comcast.net with comcast
	id Nt0s1h00T1t3BNj8at0tYg; Sun, 21 Aug 2011 05:00:55 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id C932A102C1A; Sat, 20 Aug 2011 22:00:51 -0700 (PDT)
Date: Sat, 20 Aug 2011 22:00:51 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: perryh@pluto.rain.com
Message-ID: <20110821050051.GA47415@icarus.home.lan>
References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
	<20110819232125.GA4965@icarus.home.lan>
	<B6B0AD0F-A74C-4F2C-88B0-101443D7831A@langille.org>
	<20110820032438.GA21925@icarus.home.lan>
	<4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org>
	<4E4FF4D6.1090305@os2.kiev.ua>
	<20110820183456.GA38317@icarus.home.lan>
	<4e50c931.gCNlQFqn5sVQXXax%perryh@pluto.rain.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4e50c931.gCNlQFqn5sVQXXax%perryh@pluto.rain.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: ml@os2.kiev.ua, freebsd-stable@freebsd.org, dan@langille.org,
	utisoft@gmail.com
Subject: Re: bad sector in gmirror HDD
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 21 Aug 2011 05:00:56 -0000

On Sun, Aug 21, 2011 at 02:00:33AM -0700, perryh@pluto.rain.com wrote:
> Jeremy Chadwick <freebsd@jdc.parodius.com> wrote:
> 
> > ... using dd to find the bad LBAs is the only choice he has.
> 
> or sysutils/diskcheckd.  It uses a 64KB blocksize, falling back to
> 512 -- to identify the bad LBA(s) -- after getting a failure when
> reading a large block, and IME it runs something like 10x faster
> than dd with bs=64k.
> 
> It would be advisable to check syslog configuration before using
> diskcheckd, since that is how it reports and there is reason to
> suspect that the as-shipped syslog.conf may discard at least some
> of diskcheckd's messages.

That software has a major problem where it runs constantly, rather than
periodically.  I know because I'm the one who opened the PR on it:

http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/115853

There's a discussion about this port/issue from a few days ago (how
sweet!):

http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069276.html

With comments from you stating that the software is behaving as designed
and that I misread the man page, but also stating point blank that
"either way the software runs continuously" (which is what the PR was
about in the first place):

http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069321.html

I closed the PR because when I left as a committer I no longer wanted to
deal with the issue.  I probably should have marked the PR as suspended,
but either way it's an ordeal that needs to get dealt with; it
absolutely should be re-opened in some way.

Then there's this PR, which I fully agree should have *nothing* to do
with gmirror, so I'm not even sure how to interpret what's written.
Furthermore, the author of this PR commented in PR 115853 stating
something completely different (read the first few lines very
carefully/slowly -- it seems to indicate he agrees with my PR, but then
opened up a separate PR with different wording):

http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/143566

Back to my PR.

I state that I set up diskcheckd.conf using the option you describe as
"a length of time over which to spread each pass", yet what happened was
that it did as much I/O as it could (read the entire disk in 45 minutes)
then proceeded to do it again (no sleep()).  That is not the same thing
as "do I/O over the course of 7 days".

Furthermore, the man page example gives this:

   EXAMPLES
       To check all of /dev/ad0 for errors once every two weeks, use
       this entry in diskcheckd.conf:

             /dev/ad0        *       14      *

Which is no different than what I specified in my PR other than that I
used a value of 7 and the example uses 14.  So what about the rest of
the man page?

   The second format consists of four white space separated fields,
   which are the full pathname of the disk device, the size of that disk,
   the frequency in days at which to check that disk, and the rate in kilo-
   bytes per second at which to check this disk.  Naturally, it would be
   contradictory to specify both the frequency and the rate, so only one of
   these should be specified.  Additionally, the size of the disk should not
   be specified if the rate is specified, as this information is unneces-
   sary.

I did not misread the man page, especially given what's in EXAMPLES.
It's a bug somewhere -- either in the man page or the software itself.
This software will burn through your drive constantly, unless you use
the rate-in-kilobytes-per-second field.  The frequency field doesn't
work as advertised.

And besides, such a utility really shouldn't be a daemon anyway but a
periodic(8)-called utility with appropriate locks put in place to ensure
more than one instance can't be run at once.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |