Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 May 1999 21:12:03 -0600 (MDT)
From:      "Kenneth D. Merry" <ken@plutotech.com>
To:        dkelly@hiwaay.net (David Kelly)
Cc:        freebsd-scsi@FreeBSD.ORG
Subject:   Re: proper mode page values?
Message-ID:  <199905280312.VAA23315@panzer.plutotech.com>
In-Reply-To: <199905280155.UAA53624@nospam.hiwaay.net> from David Kelly at "May 27, 1999 08:55:48 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
David Kelly wrote...
> "Kenneth D. Merry" writes:
> > > swap_pager: indefinite wait buffer: device: 0x30401, blkno: 264, size: 4096
> > > 
> > > The problem blocks are always 264, 272, and 496.
> > 
> > You should also be getting some SCSI error message printed on the console.
> 
> Actually that is what was being written to the console and 
> /var/log/messages. I don't remember any other messages but the problem 
> is easily repeatable (might take an hour, and the machine is at work 
> while my email is at home).
> 
> System spit out more than 10 or 15 of those messages before it locked
> up. Meanwile it was getting slower. X was not running. Could Alt-Fn
> between virtual consoles. Could control-alt-esc and get the "no kernel
> debugger" message. If I could read my whole tape then I could update 
> the system and isntall the kernel debugger too. System doesn't have 
> sources on it at the moment.  :-(
> 
> > That's rather odd.  It may be that the Anaconda is staying on the bus too
> > long or something.  I dunno.
> 
> That's what I'm thinking. At home I have my tape drives on a narrow 
> Adaptec 2940, the twin of the 2940 in the work machine. And a matching 
> Anaconda in both places. But at home the HD is on a wide Symbios 875.
> 
> > > Tried using camcontrol to view my bad block lists. Doesn't work on that 
> > > IBM drive, nor the IBM drive on this machine:
> > > 
> [...]
> > > nospam: [1037] camcontrol defects -n da -u 0 -f block -P
> > > error reading defect list: Input/output error
> > 
> > You need to use the -v switch on the command line to see why the command is
> > failing.
> 
> Fair enough. Doesn't look like -v adds much information:
> 
> nospam: [1045]  camcontrol defects -v -n da -u 0 -f block -G
> error reading defect list: Input/output error
> CAM status is 0
> nospam: [1046] camcontrol defects -n da -u 0 -f block -G
> error reading defect list: Input/output error
> nospam: [1047] id
> uid=0(root) gid=0(wheel) groups=0(wheel), 2(kmem), 3(sys), 4(tty), 5(operator), 20(staff), 31(guest)
> nospam: [1048] 
> 
> Ah! Forgot to check /var/log/messages. This is the output for a single 
> attempt at "camcontrol defects", the one listed above:
> 
> May 27 19:12:22 nospam /kernel: (pass2:ncr0:0:0:0): extraneous data discarded.
> May 27 19:12:22 nospam /kernel: (pass2:ncr0:0:0:0): COMMAND FAILED (9 80) @0xc0abbe00.
> May 27 19:12:22 nospam /kernel: (pass2:ncr0:0:0:0): extraneous data discarded.
> May 27 19:12:22 nospam /kernel: (pass2:ncr0:0:0:0): COMMAND FAILED (9 80) @0xc0abbe00.

Hmm, that helps a little, but not much.  I don't know how to decipher the
NCR driver error messages.

If someone knows what those NCR messages mean, it might shed some light on
things.  You could also try hooking the disks up to an Adaptec controller.
(at least I generally understand the errors the Adaptec driver spits out,
or can ask Justin what they mean)

> > > So then I go looking at mode pages to see what is set and to see if by 
> > > any chance the drive was told not to substitute replacements for 
> > > weakening blocks:
> > > 
> > > nospam: [1038] camcontrol modepage -n da -u 0 -m 1 -P 2
> > > AWRE (Auto Write Reallocation Enbld):  0 
> > > ARRE (Auto Read Reallocation Enbld):  0 
> > 
> > Those are the defaults.
> > 
> > > nospam: [1039] camcontrol modepage -n da -u 0 -m 1 -P 3
> > > AWRE (Auto Write Reallocation Enbld):  1 
> > > ARRE (Auto Read Reallocation Enbld):  1 
> > 
> > And these are the saved parameters.
> 
> Yes, I understood defaults and saved. The observation was the 9G drive
> was shipped with different saved (-P 3) values than the factory defaults
> (-P 2). The 9G saved values appear to be the same as the 4G defaults
> which were also the same as its saved values.
> 
> So sifting thru everything, it would appear AWRE and ARRE are Good 
> Things To Set?

Yes, they generally are, unless you want to remap bad blocks on your own.
(99.9% of people will want to have the drive do it automatically)

> How about "TB (Transfer Block)"? That sounds like one that will attempt 
> to copy (transfer) the contents of a sick but not dead block. But then 
> AWRE and ARRE sound like they do that too.

Actually, that means:

===========================================================================
A transfer block (TB) bit of one indicates that a data block that is not
recovered within the recovery limits specified shall be transferred to the
initiator before CHECK CONDITION status is returned. A TB bit of zero
indicates that such a data block shall not be transferred to the initiator.
The TB bit does not affect the action taken for recovered data. 
===========================================================================

That's probably not what you're looking for.

> Or maybe "EER (Enable Early Recovery)" is one that will attempt to 
> recover and repair before the damage is permanent?
> 
> "DTE (Disable Transfer on Error)", now why would we enable something 
> like TB then use a different parameter to disable it? This must mean 
> something totally different. I'm confused.

I'd suggest looking at the SCSI-2 or SCSI-3 spec.  The SCSI-2 spec (i.e.
the one I've got handy) has a reasonably long section describing mode page
1 and the various things you can set and what they do.

> > 1. Enable read and write reallocation, and then do a dd to overwrite the
> >    entire disk.  That will force any bad blocks to get remapped.
> 
> With AWRE and ARRE enabled in the first place I should never need to do 
> the above? Right? The advantage of scanning the whole disk at once as 
> above is to verify there are no problems and/or to observe the 
> automatic bad block replacement doing its thing?

Having read and write reallocation enabled doesn't necessarily mean you'll
never run into bad blocks.  The drive won't remap a block if it can't recover
the data that was in it.

That isn't so much a problem with write reallocation, since the data is
being written, and therefore is valid.  With read reallocation, though, the
block can go bad and there may be no way for the drive to recover it.

It's those blocks that you'll most likely get errors about, and it's those
blocks that the above procedure will force to get remapped.  Going over the
whole drive at once is just a quick and dirty way to force any bad blocks
on the disk to get remapped.  You can also try to read every block on the
disk, and then write to just the blocks that the disk complains about.  I
used that procedure recently to fix a bad block on a disk.

> > > Are my modepage parameters sane? Was looking at page 0x01 because I was 
> > > worried about error handling. But here's the popular 0x08 too:
> [...]
> > 
> > Looks okay to me.  The only one you might want to play with is the WCE bit,
> > which enables write caching.  That won't have any effect
> 
> WCE is the only one I've played with. Had to use bonnie to tell the 
> difference. So I put it back the way I found it.

Yeah, either way will generally work.  It depends on what sort of
performance you get and what you feel comfortable with.

Ken
-- 
Kenneth Merry
ken@plutotech.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199905280312.VAA23315>