Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 May 1999 20:55:48 -0500
From:      David Kelly <dkelly@hiwaay.net>
To:        "Kenneth D. Merry" <ken@plutotech.com>
Cc:        freebsd-scsi@FreeBSD.ORG
Subject:   Re: proper mode page values? 
Message-ID:  <199905280155.UAA53624@nospam.hiwaay.net>
In-Reply-To: Message from "Kenneth D. Merry" <ken@plutotech.com>  of "Wed, 26 May 1999 20:06:06 MDT." <199905270206.UAA16406@panzer.plutotech.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
"Kenneth D. Merry" writes:
> > swap_pager: indefinite wait buffer: device: 0x30401, blkno: 264, size: 4096
> > 
> > The problem blocks are always 264, 272, and 496.
> 
> You should also be getting some SCSI error message printed on the console.

Actually that is what was being written to the console and 
/var/log/messages. I don't remember any other messages but the problem 
is easily repeatable (might take an hour, and the machine is at work 
while my email is at home).

System spit out more than 10 or 15 of those messages before it locked
up. Meanwile it was getting slower. X was not running. Could Alt-Fn
between virtual consoles. Could control-alt-esc and get the "no kernel
debugger" message. If I could read my whole tape then I could update 
the system and isntall the kernel debugger too. System doesn't have 
sources on it at the moment.  :-(

> That's rather odd.  It may be that the Anaconda is staying on the bus too
> long or something.  I dunno.

That's what I'm thinking. At home I have my tape drives on a narrow 
Adaptec 2940, the twin of the 2940 in the work machine. And a matching 
Anaconda in both places. But at home the HD is on a wide Symbios 875.

> > Tried using camcontrol to view my bad block lists. Doesn't work on that 
> > IBM drive, nor the IBM drive on this machine:
> > 
[...]
> > nospam: [1037] camcontrol defects -n da -u 0 -f block -P
> > error reading defect list: Input/output error
> 
> You need to use the -v switch on the command line to see why the command is
> failing.

Fair enough. Doesn't look like -v adds much information:

nospam: [1045]  camcontrol defects -v -n da -u 0 -f block -G
error reading defect list: Input/output error
CAM status is 0
nospam: [1046] camcontrol defects -n da -u 0 -f block -G
error reading defect list: Input/output error
nospam: [1047] id
uid=0(root) gid=0(wheel) groups=0(wheel), 2(kmem), 3(sys), 4(tty), 5(operator), 20(staff), 31(guest)
nospam: [1048] 

Ah! Forgot to check /var/log/messages. This is the output for a single 
attempt at "camcontrol defects", the one listed above:

May 27 19:12:22 nospam /kernel: (pass2:ncr0:0:0:0): extraneous data discarded.
May 27 19:12:22 nospam /kernel: (pass2:ncr0:0:0:0): COMMAND FAILED (9 80) @0xc0abbe00.
May 27 19:12:22 nospam /kernel: (pass2:ncr0:0:0:0): extraneous data discarded.
May 27 19:12:22 nospam /kernel: (pass2:ncr0:0:0:0): COMMAND FAILED (9 80) @0xc0abbe00.

> > So then I go looking at mode pages to see what is set and to see if by 
> > any chance the drive was told not to substitute replacements for 
> > weakening blocks:
> > 
> > nospam: [1038] camcontrol modepage -n da -u 0 -m 1 -P 2
> > AWRE (Auto Write Reallocation Enbld):  0 
> > ARRE (Auto Read Reallocation Enbld):  0 
> 
> Those are the defaults.
> 
> > nospam: [1039] camcontrol modepage -n da -u 0 -m 1 -P 3
> > AWRE (Auto Write Reallocation Enbld):  1 
> > ARRE (Auto Read Reallocation Enbld):  1 
> 
> And these are the saved parameters.

Yes, I understood defaults and saved. The observation was the 9G drive
was shipped with different saved (-P 3) values than the factory defaults
(-P 2). The 9G saved values appear to be the same as the 4G defaults
which were also the same as its saved values.

So sifting thru everything, it would appear AWRE and ARRE are Good 
Things To Set?

How about "TB (Transfer Block)"? That sounds like one that will attempt 
to copy (transfer) the contents of a sick but not dead block. But then 
AWRE and ARRE sound like they do that too.

Or maybe "EER (Enable Early Recovery)" is one that will attempt to 
recover and repair before the damage is permanent?

"DTE (Disable Transfer on Error)", now why would we enable something 
like TB then use a different parameter to disable it? This must mean 
something totally different. I'm confused.

> 1. Enable read and write reallocation, and then do a dd to overwrite the
>    entire disk.  That will force any bad blocks to get remapped.

With AWRE and ARRE enabled in the first place I should never need to do 
the above? Right? The advantage of scanning the whole disk at once as 
above is to verify there are no problems and/or to observe the 
automatic bad block replacement doing its thing?

> > Are my modepage parameters sane? Was looking at page 0x01 because I was 
> > worried about error handling. But here's the popular 0x08 too:
[...]
> 
> Looks okay to me.  The only one you might want to play with is the WCE bit,
> which enables write caching.  That won't have any effect

WCE is the only one I've played with. Had to use bonnie to tell the 
difference. So I put it back the way I found it.

--
David Kelly N4HHE, dkelly@nospam.hiwaay.net
=====================================================================
The human mind ordinarily operates at only ten percent of its
capacity -- the rest is overhead for the operating system.




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199905280155.UAA53624>