From owner-freebsd-stable@FreeBSD.ORG Sun Aug 21 05:00:56 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E5ED106566B for ; Sun, 21 Aug 2011 05:00:56 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [76.96.30.96]) by mx1.freebsd.org (Postfix) with ESMTP id 10B8E8FC12 for ; Sun, 21 Aug 2011 05:00:55 +0000 (UTC) Received: from omta14.emeryville.ca.mail.comcast.net ([76.96.30.60]) by qmta09.emeryville.ca.mail.comcast.net with comcast id Nt0C1h0021HpZEsA9t0r5N; Sun, 21 Aug 2011 05:00:51 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta14.emeryville.ca.mail.comcast.net with comcast id Nt0s1h00T1t3BNj8at0tYg; Sun, 21 Aug 2011 05:00:55 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id C932A102C1A; Sat, 20 Aug 2011 22:00:51 -0700 (PDT) Date: Sat, 20 Aug 2011 22:00:51 -0700 From: Jeremy Chadwick To: perryh@pluto.rain.com Message-ID: <20110821050051.GA47415@icarus.home.lan> References: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org> <20110819232125.GA4965@icarus.home.lan> <20110820032438.GA21925@icarus.home.lan> <4774BC00-F32B-4BF4-A955-3728F885CAA1@langille.org> <4E4FF4D6.1090305@os2.kiev.ua> <20110820183456.GA38317@icarus.home.lan> <4e50c931.gCNlQFqn5sVQXXax%perryh@pluto.rain.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4e50c931.gCNlQFqn5sVQXXax%perryh@pluto.rain.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: ml@os2.kiev.ua, freebsd-stable@freebsd.org, dan@langille.org, utisoft@gmail.com Subject: Re: bad sector in gmirror HDD X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Aug 2011 05:00:56 -0000 On Sun, Aug 21, 2011 at 02:00:33AM -0700, perryh@pluto.rain.com wrote: > Jeremy Chadwick wrote: > > > ... using dd to find the bad LBAs is the only choice he has. > > or sysutils/diskcheckd. It uses a 64KB blocksize, falling back to > 512 -- to identify the bad LBA(s) -- after getting a failure when > reading a large block, and IME it runs something like 10x faster > than dd with bs=64k. > > It would be advisable to check syslog configuration before using > diskcheckd, since that is how it reports and there is reason to > suspect that the as-shipped syslog.conf may discard at least some > of diskcheckd's messages. That software has a major problem where it runs constantly, rather than periodically. I know because I'm the one who opened the PR on it: http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/115853 There's a discussion about this port/issue from a few days ago (how sweet!): http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069276.html With comments from you stating that the software is behaving as designed and that I misread the man page, but also stating point blank that "either way the software runs continuously" (which is what the PR was about in the first place): http://lists.freebsd.org/pipermail/freebsd-ports/2011-August/069321.html I closed the PR because when I left as a committer I no longer wanted to deal with the issue. I probably should have marked the PR as suspended, but either way it's an ordeal that needs to get dealt with; it absolutely should be re-opened in some way. Then there's this PR, which I fully agree should have *nothing* to do with gmirror, so I'm not even sure how to interpret what's written. Furthermore, the author of this PR commented in PR 115853 stating something completely different (read the first few lines very carefully/slowly -- it seems to indicate he agrees with my PR, but then opened up a separate PR with different wording): http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/143566 Back to my PR. I state that I set up diskcheckd.conf using the option you describe as "a length of time over which to spread each pass", yet what happened was that it did as much I/O as it could (read the entire disk in 45 minutes) then proceeded to do it again (no sleep()). That is not the same thing as "do I/O over the course of 7 days". Furthermore, the man page example gives this: EXAMPLES To check all of /dev/ad0 for errors once every two weeks, use this entry in diskcheckd.conf: /dev/ad0 * 14 * Which is no different than what I specified in my PR other than that I used a value of 7 and the example uses 14. So what about the rest of the man page? The second format consists of four white space separated fields, which are the full pathname of the disk device, the size of that disk, the frequency in days at which to check that disk, and the rate in kilo- bytes per second at which to check this disk. Naturally, it would be contradictory to specify both the frequency and the rate, so only one of these should be specified. Additionally, the size of the disk should not be specified if the rate is specified, as this information is unneces- sary. I did not misread the man page, especially given what's in EXAMPLES. It's a bug somewhere -- either in the man page or the software itself. This software will burn through your drive constantly, unless you use the rate-in-kilobytes-per-second field. The frequency field doesn't work as advertised. And besides, such a utility really shouldn't be a daemon anyway but a periodic(8)-called utility with appropriate locks put in place to ensure more than one instance can't be run at once. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |