From owner-freebsd-stable Wed Jul 8 14:50:08 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id OAA03286 for freebsd-stable-outgoing; Wed, 8 Jul 1998 14:50:08 -0700 (PDT) (envelope-from owner-freebsd-stable@FreeBSD.ORG) Received: from pop.asahi-net.or.jp (pop.asahi-net.or.jp [202.224.39.6]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id OAA03204 for ; Wed, 8 Jul 1998 14:50:01 -0700 (PDT) (envelope-from tfuruya@ppp142009.asahi-net.or.jp) Received: from galois.tf.or.jp (ppp142009.asahi-net.or.jp [202.213.142.9]) by pop.asahi-net.or.jp (8.8.8/3.6W) with ESMTP id GAA22500; Thu, 9 Jul 1998 06:55:24 +0900 Received: from galois.tf.or.jp (localhost.tf.or.jp [127.0.0.1]) by galois.tf.or.jp (8.8.8/3.6W-ht5t-fry@asahi-net-98042218) with ESMTP id GAA01464; Thu, 9 Jul 1998 06:49:32 +0900 (JST) Message-Id: <199807082149.GAA01464@galois.tf.or.jp> To: smarzloff@carif-idf.org Cc: freebsd-stable@FreeBSD.ORG, Tetsuro FURUYA Subject: Re: Disk problem. From: Tetsuro FURUYA Reply-To: Tetsuro FURUYA In-Reply-To: Your message of "Wed, 8 Jul 1998 17:30:36 +0200" References: <19980708173036.A14305@rafiki.intranet.carif.asso.fr> X-Mailer: Mew version 1.54 on Emacs 19.28.1, Mule 2.3 X-fingerprint: F1 BA 5F C1 C2 48 1D C7 AE 5F 16 ED 12 17 75 38 X-URL: http://sodan.komaba.ecc.u-tokyo.ac.jp/~tfuruya/ Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Thu, 09 Jul 1998 06:49:32 +0900 Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Stephane Marzloff wrote: > Hi.. > > I have a problem with a 2.2.6-STABLE (6 Jul) on a Ppro 200. > > Sometimes, when I launch some applications (mutt, ls, vmstat..), there is no > responses during 10 sec. > I suspect a disk problem. > > The machine isn't charge, Load average is constantly : 0.00 (0.50 maximum). > There 18Mo of Free RAM. > > And 5 minutes ago, I have this message on the console : > Jul 8 17:07:46 rafiki /kernel: wd0: interrupt timeout: > Jul 8 17:07:46 rafiki /kernel: wd0: interrupt timeout: > Jul 8 17:07:46 rafiki /kernel: wd0: status 50 error 0 > Jul 8 17:07:46 rafiki /kernel: wd0: status 50 error 0 Your ide disk sector is broken. Try bad144 -s -v /dev/wd0 or badsect & fsck (This is rather difficult. So, please read man). If system hang up while disk access, 1) install kernel debugger ddb compiled into kernel. When system hang up, type contrl-alt-esc, and get into ddb, and wait until disk access stops for about 20-60 seconds(this depends on system). Then, type 'c' to continue bad144 or fsck. 2) patch /usr/src/sys/i386/isa/wd.c. See this mail. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Message-Id: <199806102228.PAA00747@dingo.cdrom.com> X-Mailer: exmh version 2.0zeta 7/24/97 To: Tetsuro FURUYA cc: mike@smith.net.au, robinson@public.bta.net.cn, freebsd-stable@freebsd.org, freebsd-questions@freebsd.org, Tetsuro FURUYA Subject: Re: Bug in wd driver In-reply-to: Your message of "Thu, 11 Jun 1998 04:41:08 +0900." <199806101941.EAA11696@dilemma.tf.or.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 10 Jun 1998 15:28:29 -0700 From: Mike Smith Sender: owner-freebsd-stable@freebsd.org X-Loop: FreeBSD.ORG > > > >fsck /usr > > > >..... > > > >wd0: interrupt timeout: > > > >wd0: status 50 error 0 > > > >wd0: interrupt timeout: > > > >wd0: status 50 error 1 > > > > > > >===> hang up > > > >===> type 'cntrl-alt-esc' > > > > This defers the interrupt timeout... > > > > > >db>wd0s1f: hard error reading fsbn 1152850 of 1152850-1152851(wd0s1 bn > > > >1279826; cn 317 tn 26 sn 44) > > > >wd0: status 59 error 40 > > > > ... but not the interrupt, which finally arrives and contains real > > error information. Note that the interrupt timeouts in your case > > *don't* have DRQ set. Are you running in multi-block mode? > > > > > As for wd.c source, I will try to experiment :) > > > > Please do. It looks like your information may lead to a result here. > > It seems too late for writing reply to mailing list. Not at all; better late than never! > But, this seems important to note-users, so I dare to report the result of > my experiment of patch to /usr/src/sys/i386/isa/wd.c > which Mr. Mike Smith's stated, ... > > if (wdtab[ctrlr].b_errcnt == 0) > > du->dk_timeout = 1 + 10; > > else > > du->dk_timeout = 1 + 3; <---- Only this line. > > > > > >Increase the 10 and 3 values (first and subsequent timeouts). Try > >raising them lots, then come down slowly. > > Unfortunately, my /usr/src/sys/i386/isa/wd.c is different > from the above source code. > There is just only the last line in the wd.c. > > So, I rewrite only this last line, and increased 3 to 50. ( Is this OK?) It's just a number, and you're in the best position to determine whether it's big enough. > Up to now, I have not yet experienced any disk crash, nor cannot-mount-root > problem, nor anything bad else. Excellent! And thanks for confirming this. I hope that the original plaintiff is in a position to try this themselves - I would be more than happy to be completely wrong about the situation. 8) > You have written that > >raising them lots, then come down slowly. > > Is there any inconvenience when du->dk_timeout value is > very large ? > What if du->dk_timeout value is too large ? The only inconvenience is in the case where the disk has truly failed to generate an interrupt, and the delay involved before reporting the failure. > What is this du->dk_timeout ? It determines how long a disk is allowed to take to complete a command. > I've just tried 'cd /usr; badsect BAD 1152850 1215577' & 'fsck /dev/rwd0s1f', > but 'bad144 -s -v /dev/wd0' should work fine. > ( I had often used bad144. But now, my bad sectors of wd0 become too many > for bad144 :( ) > badsect & fsck don't take care of swap area, > nevertheless they are working fine now :) > > So, Thank you Mr. Mike Smith ! No, definitely this time the thanks are for you. I'll look at increasing this timeout significantly for both -stable and -current, if someone doesn't beat me to it. -- \\ Sometimes you're ahead, \\ Mike Smith \\ sometimes you're behind. \\ mike@smith.net.au \\ The race is long, and in the \\ msmith@freebsd.org \\ end it's only with yourself. \\ msmith@cdrom.com =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message ======================================================================== TEL: 048-852-3520 FAX: 048-858-1597 E-Mail: ht5t-fry@asahi-net.or.jp tfu@ff.iij4u.or.jp pgp-fingerprint: pub Tetsuro FURUYA Key fingerprint = F1 BA 5F C1 C2 48 1D C7 AE 5F 16 ED 12 17 75 38 ========================================================================= To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message