Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 09 Jul 1998 06:49:32 +0900
From:      Tetsuro FURUYA <ht5t-fry@asahi-net.or.jp>
To:        smarzloff@carif-idf.org
Cc:        freebsd-stable@FreeBSD.ORG, Tetsuro FURUYA <tfu@ff.iij4u.or.jp>
Subject:   Re: Disk problem.
Message-ID:  <199807082149.GAA01464@galois.tf.or.jp>
In-Reply-To: Your message of "Wed, 8 Jul 1998 17:30:36 %2B0200"
References:  <19980708173036.A14305@rafiki.intranet.carif.asso.fr>

next in thread | previous in thread | raw e-mail | index | archive | help

Stephane Marzloff <smarzloff@carif-idf.org> wrote:

> Hi..
> 
> I have a problem with a 2.2.6-STABLE (6 Jul) on a Ppro 200.
> 
> Sometimes, when I launch some applications (mutt, ls, vmstat..), there is no
> responses during 10 sec.
> I suspect a disk problem.
> 
> The machine isn't charge, Load average is constantly : 0.00 (0.50 maximum).
> There 18Mo of Free RAM.
> 
> And 5 minutes ago, I have this message on the console :
> Jul  8 17:07:46 rafiki /kernel: wd0: interrupt timeout:
> Jul  8 17:07:46 rafiki /kernel: wd0: interrupt timeout:
> Jul  8 17:07:46 rafiki /kernel: wd0: status 50<rdy,seekdone> error 0
> Jul  8 17:07:46 rafiki /kernel: wd0: status 50<rdy,seekdone> error 0

Your ide disk sector is broken.
Try 
bad144 -s -v /dev/wd0
or
badsect & fsck (This is rather difficult. So, please read man).

If system hang up while disk access, 
1) install kernel debugger ddb compiled into kernel.
   When system hang up, type contrl-alt-esc, and get into ddb,
   and wait until disk access stops for about 20-60 seconds(this depends 
   on system).
   Then, type 'c' to continue bad144 or fsck.


2) patch /usr/src/sys/i386/isa/wd.c.
   See this mail.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Message-Id: <199806102228.PAA00747@dingo.cdrom.com>
X-Mailer: exmh version 2.0zeta 7/24/97
To: Tetsuro FURUYA <ht5t-fry@asahi-net.or.jp>
cc: mike@smith.net.au,
 robinson@public.bta.net.cn,
 freebsd-stable@freebsd.org,
 freebsd-questions@freebsd.org,
 Tetsuro FURUYA <tfu@ff.iij4u.or.jp>
Subject: Re: Bug in wd driver 
In-reply-to: Your message of "Thu, 11 Jun 1998 04:41:08 +0900."
             <199806101941.EAA11696@dilemma.tf.or.jp> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 10 Jun 1998 15:28:29 -0700
From: Mike Smith <mike@smith.net.au>
Sender: owner-freebsd-stable@freebsd.org
X-Loop: FreeBSD.ORG

> > > >fsck /usr
> > > >.....
> > > >wd0: interrupt timeout:
> > > >wd0: status 50<rdy,seekdone> error 0
> > > >wd0: interrupt timeout:
> > > >wd0: status 50<rdy,seekdone> error 1<no_dam>
> > > 
> > > >===> hang up
> > > >===> type 'cntrl-alt-esc'
> > 
> > This defers the interrupt timeout...
> > 
> > > >db>wd0s1f: hard error reading fsbn 1152850 of 1152850-1152851(wd0s1 bn
> > > >1279826; cn 317 tn 26 sn 44)
> > > >wd0: status 59<rdy,seekdone,drq,err> error 40<uncorr>
> > 
> > ... but not the interrupt, which finally arrives and contains real 
> > error information.  Note that the interrupt timeouts in your case 
> > *don't* have DRQ set.  Are you running in multi-block mode?
> > 
> > > As for wd.c source, I will try to experiment :)
> > 
> > Please do.  It looks like your information may lead to a result here.  
> 
> It seems too late for writing reply to mailing list.

Not at all; better late than never!

> But, this seems important to note-users, so I dare to report the result of
> my experiment of patch to /usr/src/sys/i386/isa/wd.c
> which Mr. Mike Smith's stated,
...
> >        if (wdtab[ctrlr].b_errcnt == 0)
> >                du->dk_timeout = 1 + 10;
> >        else
> >                du->dk_timeout = 1 + 3;   <---- Only this line.
> >
> >
> >Increase the 10 and 3 values (first and subsequent timeouts).  Try 
> >raising them lots, then come down slowly.
> 
> Unfortunately, my /usr/src/sys/i386/isa/wd.c is different
> from the above source code.
> There is just only the last line in the wd.c.
> 
> So, I rewrite only this last line, and increased 3 to 50. ( Is this OK?)

It's just a number, and you're in the best position to determine 
whether it's big enough.

> Up to now, I have not yet experienced any disk crash, nor cannot-mount-root
> problem, nor anything bad else.

Excellent!  And thanks for confirming this.  I hope that the original 
plaintiff is in a position to try this themselves - I would be more 
than happy to be completely wrong about the situation.  8)

> You have written that 
> >raising them lots, then come down slowly.
> 
> Is there any inconvenience when du->dk_timeout value is
> very large ?
> What if du->dk_timeout value is too large ?

The only inconvenience is in the case where the disk has truly failed 
to generate an interrupt, and the delay involved before reporting the 
failure.

> What is this du->dk_timeout ?

It determines how long a disk is allowed to take to complete a command.

> I've just tried 'cd /usr; badsect BAD 1152850 1215577' & 'fsck /dev/rwd0s1f',
>  but 'bad144 -s -v /dev/wd0' should work fine. 
> ( I had often used bad144. But now, my bad sectors of wd0 become too many
>  for bad144 :( )
> badsect & fsck don't take care of swap area,
>  nevertheless they are working fine now :)
> 
> So, Thank you Mr. Mike Smith !

No, definitely this time the thanks are for you.  I'll look at
increasing this timeout significantly for both -stable and -current, if 
someone doesn't beat me to it.

-- 
\\  Sometimes you're ahead,       \\  Mike Smith
\\  sometimes you're behind.      \\  mike@smith.net.au
\\  The race is long, and in the  \\  msmith@freebsd.org
\\  end it's only with yourself.  \\  msmith@cdrom.com

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message

========================================================================
TEL: 048-852-3520    FAX: 048-858-1597
E-Mail:
     ht5t-fry@asahi-net.or.jp
     tfu@ff.iij4u.or.jp
pgp-fingerprint:
     pub  Tetsuro FURUYA <ht5t-fry@asahi-net.or.jp>
      Key fingerprint = F1 BA 5F C1 C2 48 1D C7  AE 5F 16 ED 12 17 75 38
=========================================================================

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199807082149.GAA01464>