From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 5 12:41:32 2011 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E0D6A106566B for ; Tue, 5 Apr 2011 12:41:31 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 2FC7C8FC0A for ; Tue, 5 Apr 2011 12:41:30 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA03610; Tue, 05 Apr 2011 15:41:27 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4D9B0DF7.8020104@FreeBSD.org> Date: Tue, 05 Apr 2011 15:41:27 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110309 Lightning/1.0b2 Thunderbird/3.1.9 MIME-Version: 1.0 To: Borja Marcos References: <4D9AF9B7.9030107@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-scsi@FreeBSD.org Subject: Re: propose: change some sense codes handling X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Apr 2011 12:41:32 -0000 on 05/04/2011 14:30 Borja Marcos said the following: > > On Apr 5, 2011, at 1:15 PM, Andriy Gapon wrote: > >> >> I propose the following changes: >> >> - { SST(0x28, 0x00, SS_FATAL | ENXIO, >> + { SST(0x28, 0x00, SS_TUR | SSQ_MANY | SSQ_DECREMENT_COUNT | EBUSY, >> "Not ready to ready change, medium may have changed") }, >> In my opinion this condition doesn't really mean a fatal error, but implies that >> we should retry while new medium "settles down". > > As far as I know, this shouldn't be reported by a non-removable media device. It should be used by removable media such as tape units, magneto-optical drives, CDROM drives, WORMs... > > Many years ago I used to write to SCSI tapes. If the operator changed a tape, for example, while the tape was idle, the next read or write command returned this code, indicating that there was a media change. And it was important indeed, as our application sometimes wrote to tape in relatively small chunks and it only rewound the tape when necessary. > > So, if the system was expecting a given tape to be in the unit and it tried to write, that try failed reporting a tape change. The software issued a rewind command and read the tape label to ensure that it was the right tape (in which case it issued a seek to the end of the recorded data) or created a new tape label, labelled it, etc etc. > > Assuming that manufacturers are using it as expected, if this was reported by a removable media random access device (say, a magneto optical disk) it should result in the disappearance of the "changed disk", creation of a new disk. I mean, reread partition table et all, and invalidation of any mount points related to the "disappeared" device. > >> In my testing this change actually helps with some USB flashdrives and >> cardreaders with slow access to media. > > If a card read reports this, I assume that either the reader has crappy firmware _or_ it has an electrical contact problem with the media. But ignoring this error just could lead to data loss. In the case of a user replacing a memory card with a mounted filesystem, it would be certainly a data loss (blocks intended for one card written to a different card?) Interesting. Thank you for sharing this information! Let me think about it :) >> Perhaps some real SCSI devices use this sense code to signal a really "fatal" >> condition? Please let me know. >> >> --- a/sys/cam/scsi/scsi_all.c >> +++ b/sys/cam/scsi/scsi_all.c >> @@ -1448,7 +1448,7 @@ static struct asc_table_entry asc_table[] = { >> * the networking errnos? ECONNRESET anyone? >> */ >> /* DTLPWROMAEBKVF */ >> - { SST(0x29, 0x00, SS_FATAL | ENXIO, >> + { SST(0x29, 0x00, SS_RDEF, >> "Power on, reset, or bus device reset occurred") }, >> /* DTLPWROMAEBKVF */ >> { SST(0x29, 0x01, SS_RDEF, >> >> Align handling of this condition with the rest of the conditions in the same >> family: "Power on occurred", "SCSI bus reset occurred", "Bus device reset >> function occurred", etc. >> I don't see this particular condition should be special. >> Any insights and/or historical reasons? > > I would be cautious with this. Of course if it happened with no outstanding operations and data committed to media, it should be harmless. But if you power cycle a hard disk with a dirty cache, some of the data won't be committed to disk. If you just retry the operation and otherwise ignore the message (which is equivalent to just logging and retrying) you keep writing data to a possibly corrupted medium. It can certainly led to further corruption and make the problem worse. > > My opinion, of course ;) Sure :) But why would this particular sense code be so different from other similar codes that I quoted above. -- Andriy Gapon