Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 08 Oct 2002 11:57:11 -0500
From:      Gunther Schadow <gunther@aurora.regenstrief.org>
To:        freebsd-scsi@freebsd.org
Subject:   SCSI bad block remapping doesn't work!?#@$
Message-ID:  <3DA30E67.8000206@aurora.regenstrief.org>

next in thread | raw e-mail | index | archive | help
Hi,

I can't seem to be able to make my SCSI disk map bad blocks. Can someone
please look over my shoulder and see what I may be doing wrong? I have

$ uname -a
FreeBSD ... 4.4-RELEASE FreeBSD 4.4-RELEASE ... i386

$ camcontrol inquiry da1
pass1: <COMPAQPC WDE4360 1.52> Fixed Direct Access SCSI-2 device
pass1: Serial Number WS7010556513
pass1: 20.000MB/s transfers (20.000MHz, offset 15), Tagged Queueing Enabled

So, here are the errors:

(da1:ahc0:0:2:0): READ(06). CDB: 8 15 f6 90 80 0
(da1:ahc0:0:2:0): MEDIUM ERROR info:15f6b2 csi:ff,ff,ff,ff asc:11,1
(da1:ahc0:0:2:0): Read retries exhausted sks:80,ac
(da1:ahc0:0:2:0): READ(06). CDB: 8 15 f6 a0 70 0
(da1:ahc0:0:2:0): MEDIUM ERROR info:15f6b2 csi:ff,ff,ff,ff asc:11,1
(da1:ahc0:0:2:0): Read retries exhausted sks:80,ac
(da1:ahc0:0:2:0): READ(06). CDB: 8 15 f6 a0 20 0
(da1:ahc0:0:2:0): MEDIUM ERROR info:15f6b2 csi:ff,ff,ff,ff asc:11,1
(da1:ahc0:0:2:0): Read retries exhausted sks:80,ac
...

and so on. Apparently only one of my files was affected near the end. So,
I tried to save that file doing this:

$ dd bs=8192 if=thefile of=thebackup bs=8192 conv=noerror

and I seem to have all I can reasonably expect to get.

Now, I did make sure the auto reallocation is enabled:

$ camcontrol modepage da1 -m 1
AWRE (Auto Write Reallocation Enbld):  1
ARRE (Auto Read Reallocation Enbld):  1
TB (Transfer Block):  0
RC (Read Continuous):  0
EER (Enable Early Recovery):  0
PER (Post Error):  0
DTE (Disable Transfer on Error):  0
DCR (Disable Correction):  0
Read Retry Count:  255
Correction Span:  48
Head Offset Count:  0
Data Strobe Offset Count:  0
Write Retry Count:  255
Recovery Time Limit:  0

and checked the defect lists:

$ camcontrol defects da1 -f phys -P
Got 119 defects:
59:4:-1
77:0:77
77:0:78
...
5133:6:128
5319:3:39
5365:4:121

$ camcontrol defects da1 -f phys -G
Got 0 defects.

the latter makes me suspicious. Too good to be true. I need to write
to this block to get it listed, so I thought, and I did this:

$ dd if=thefile of=thefile bs=8192 conv=noerror,sync,notrunc

just to check if we can "refresh" a file in place, the idea of which
I found pretty neat (thefile is really big, so it comes handy to save
space.)

When it came to the bad block I got the same errors as above and still:

$ camcontrol defects da1 -f phys -G
Got 0 defects.

redoing the same with

$ dd if=thebackup of=thefile bs=8192

didn't help either. No remapping took place and the bad block would still
be there.

Finally I deleted the file and wrote a big file into the directory:

$ dd if=/dev/zero of=bigfile bs=8192

until that command failed because of disk full. Now I thought I should
get the sucker remapped, but still nothing:

$ camcontrol defects da1 -f phys -G
Got 0 defects.

I also tried to read the whole bigfile:

$ dd if=/dev/zero of=/dev/null bs=8192

and I still get this bad block error! So, now I'm gonna use badsect(8) to
isolate that stupid block so it won't hurt me again. In order to do that
I needed to find out the sector number(s) to use for badsect. So I did

$ dd if=/dev/da1s1e of=/dev/null bs=512

and with that I manually try to get all the bad blocks which is a pain in
the but! The conv=notrunc option to dd doesn't seem to work and report
all errors that it finds. So now I use dd with skip and count to get a
list of all bad blocks with their relative sector numbers starting from
the beginning of the partition:

$ dd if=/dev/da1s1e of=/dev/null bs=512
dd: /dev/da1s1e: Input/output error
1336978+0 records in
...

probe for this sector in particular

$ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1336978 count=1
dd: /dev/da1s1e: Input/output error
0+0 records in
...

(also probe the next one and the previous one to be sure it's just
him.) Then skip over it and scan the rest:

$ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1336979
dd: /dev/da1s1e: Input/output error
1860+0 records in
...

now add the number of records in to where we started to get the next
sector number, probe it to be sure:

$ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1338839 count=1
dd: /dev/da1s1e: Input/output error
0+0 records in
...

and skip over that one again to scan the rest

$ dd if=/dev/da1s1e of=/dev/null bs=512 skip=1338840
dd: /dev/da1s1e: Input/output error
3480+0 records in
...

and so on until the rest is read without errors.

This is a pain in the butt!

Now I have 6 bad sectors (fortunately only 6!). According to badsect(8)
I make a directory BAD in the root directory of that filesystem and
say:

$ badsect BAD 1336978 1338839 1342320 1343737 ...

and all the bad blocks. Now umount that fs and fsck, hold the bad block
yes, fsck warns about "softupdate inconsistency" I can't get it right
the first time so I give in to its persistent suggestions to delete the
BAD/* files. Then do it again with the BIGFILE deleted that crosslinked
these bad blocks, and this time it works.

Why did I have to go through those hassles? Why didn't the SCSI subsystem,
the disk drive itself do the bad sector remapping? I remember 2 years
ago I had the same hassle with a different disk and I don't remember this
automatic reallocation had ever worked for me inspite of me turning it
on and double and triple checking the modepage 1 that it was indeed
enabled. What am I doing wrong????

thanks,
-Gunther






To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3DA30E67.8000206>