Date: Tue, 5 Apr 2011 13:30:04 +0200 From: Borja Marcos <borjam@sarenet.es> To: Andriy Gapon <avg@FreeBSD.org> Cc: freebsd-scsi@FreeBSD.org Subject: Re: propose: change some sense codes handling Message-ID: <D10B0D62-E11E-445C-B9FA-DB4276F678B0@sarenet.es> In-Reply-To: <4D9AF9B7.9030107@FreeBSD.org> References: <4D9AF9B7.9030107@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Apr 5, 2011, at 1:15 PM, Andriy Gapon wrote: >=20 > I propose the following changes: >=20 > - { SST(0x28, 0x00, SS_FATAL | ENXIO, > + { SST(0x28, 0x00, SS_TUR | SSQ_MANY | SSQ_DECREMENT_COUNT | = EBUSY, > "Not ready to ready change, medium may have changed") }, > In my opinion this condition doesn't really mean a fatal error, but = implies that > we should retry while new medium "settles down". As far as I know, this shouldn't be reported by a non-removable media = device. It should be used by removable media such as tape units, = magneto-optical drives, CDROM drives, WORMs... Many years ago I used to write to SCSI tapes. If the operator changed a = tape, for example, while the tape was idle, the next read or write = command returned this code, indicating that there was a media change. = And it was important indeed, as our application sometimes wrote to tape = in relatively small chunks and it only rewound the tape when necessary. So, if the system was expecting a given tape to be in the unit and it = tried to write, that try failed reporting a tape change. The software = issued a rewind command and read the tape label to ensure that it was = the right tape (in which case it issued a seek to the end of the = recorded data) or created a new tape label, labelled it, etc etc. Assuming that manufacturers are using it as expected, if this was = reported by a removable media random access device (say, a magneto = optical disk) it should result in the disappearance of the "changed = disk", creation of a new disk. I mean, reread partition table et all, = and invalidation of any mount points related to the "disappeared" = device.=20 > In my testing this change actually helps with some USB flashdrives and > cardreaders with slow access to media. If a card read reports this, I assume that either the reader has crappy = firmware _or_ it has an electrical contact problem with the media. But = ignoring this error just could lead to data loss. In the case of a user = replacing a memory card with a mounted filesystem, it would be certainly = a data loss (blocks intended for one card written to a different card?) > Perhaps some real SCSI devices use this sense code to signal a really = "fatal" > condition? Please let me know. >=20 > --- a/sys/cam/scsi/scsi_all.c > +++ b/sys/cam/scsi/scsi_all.c > @@ -1448,7 +1448,7 @@ static struct asc_table_entry asc_table[] =3D { > * the networking errnos? ECONNRESET anyone? > */ > /* DTLPWROMAEBKVF */ > - { SST(0x29, 0x00, SS_FATAL | ENXIO, > + { SST(0x29, 0x00, SS_RDEF, > "Power on, reset, or bus device reset occurred") }, > /* DTLPWROMAEBKVF */ > { SST(0x29, 0x01, SS_RDEF, >=20 > Align handling of this condition with the rest of the conditions in = the same > family: "Power on occurred", "SCSI bus reset occurred", "Bus device = reset > function occurred", etc. > I don't see this particular condition should be special. > Any insights and/or historical reasons? I would be cautious with this. Of course if it happened with no = outstanding operations and data committed to media, it should be = harmless. But if you power cycle a hard disk with a dirty cache, some of = the data won't be committed to disk. If you just retry the operation and = otherwise ignore the message (which is equivalent to just logging and = retrying) you keep writing data to a possibly corrupted medium. It can = certainly led to further corruption and make the problem worse. My opinion, of course ;) Borja.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D10B0D62-E11E-445C-B9FA-DB4276F678B0>