From owner-freebsd-scsi@FreeBSD.ORG Tue Feb 28 16:55:24 2006 Return-Path: X-Original-To: freebsd-scsi@freebsd.org Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 328EC16A458 for ; Tue, 28 Feb 2006 16:55:24 +0000 (GMT) (envelope-from ken@nargothrond.kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.FreeBSD.org (Postfix) with ESMTP id AE01243D45 for ; Tue, 28 Feb 2006 16:55:23 +0000 (GMT) (envelope-from ken@nargothrond.kdm.org) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.13.4/8.12.11) with ESMTP id k1SGtLCR009365; Tue, 28 Feb 2006 09:55:21 -0700 (MST) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.13.4/8.12.5/Submit) id k1SGtLqL009364; Tue, 28 Feb 2006 09:55:21 -0700 (MST) (envelope-from ken) Date: Tue, 28 Feb 2006 09:55:21 -0700 From: "Kenneth D. Merry" To: ticso@cicely.de Message-ID: <20060228165521.GA9261@nargothrond.kdm.org> References: <20060227201644.GR64548@cicely12.cicely.de> <20060227202254.GA1016@nargothrond.kdm.org> <20060227204326.GS64548@cicely12.cicely.de> <20060228161004.GA9002@nargothrond.kdm.org> <20060228162647.GZ64548@cicely12.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060228162647.GZ64548@cicely12.cicely.de> User-Agent: Mutt/1.4.2i X-Virus-Scanned: ClamAV 0.87.1/1306/Tue Feb 28 02:50:04 2006 on nargothrond.kdm.org X-Virus-Status: Clean Cc: Bernd Walter , freebsd-scsi@freebsd.org Subject: Re: Automatic unit start broken? X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Feb 2006 16:55:24 -0000 On Tue, Feb 28, 2006 at 17:26:47 +0100, Bernd Walter wrote: > On Tue, Feb 28, 2006 at 09:10:04AM -0700, Kenneth D. Merry wrote: > > On Mon, Feb 27, 2006 at 21:43:27 +0100, Bernd Walter wrote: > > > On Mon, Feb 27, 2006 at 01:22:54PM -0700, Kenneth D. Merry wrote: > > > > On Mon, Feb 27, 2006 at 21:16:45 +0100, Bernd Walter wrote: > > > > What error code do your disks return? You will probably see some console > > > > output if GEOM has tried to read metadata off the disk and that initial > > > > read fails. > > > > > > > > If the drive returns 0x04,0x02 ("Logical unit not ready, initializing cmd. > > > > required"), CAM will attempt to spin the disk up automatically and retry > > > > the command. > > > > > > During the first tests I waited 90s in loader to let all delayed spin > > > up drives spin up. > > > This is with recent RELENG_6 and a drive which don't spin up themself: > > > [...] > > > da7 at esp1 bus 0 target 10 lun 0 > > > da7: Fixed Direct Access SCSI-3 device > > > da7: 20.000MB/s transfers (10.000MHz, offset 15, 16bit), Tagged Queueing Enabled > > > da7: Attempt to query device size failed: NOT READY, Logical unit not ready, initial > > > > That's rather odd, since it looks like you've got an 0x04,0x02 response, > > but the device must have rejected the start unit command if we failed to > > get capacity information. > > At least the drive won't fail a start unit when done via camcontrol. That's good. > > > [...] > > > No GEOM message about this driver until rc sends a start command and > > > GEOM is retriggered to reread the drive: > > > Unit started successfully > > > GEOM_LABEL: Label for provider da7 is ufs/dump1. > > > The following commands were used in rc: > > > camcontrol start -n da -u 7 > > > cat /dev/null > /dev/da7 > > > > > > Without the loader delay other disks are having problems as well: > > > da9 at esp1 bus 0 target 14 lun 0 > > > da9: Fixed Direct Access SCSI-3 device > > > da9: 20.000MB/s transfers (10.000MHz, offset 15, 16bit), Tagged Queueing Enabled > > > da9: Attempt to query device size failed: NOT READY, Logical unit is in process of b > > > > > > > That's a different error. We won't send a start unit in that case. The > > error recovery action for 0x04,0x01 is to send a test unit ready every half > > second for a minute until the device becomes ready. > > Evidently it didn't become ready after that period of time. > > Possible that this works, but a minute is hardly enough for a drive > with ID 14 - considered 6s per ID this means the given drive requires > 84s after power-up. > But I doubt that the kernel waits - I should have noticed waiting a > whole minute. > Where is the minute defined? > If it is not solved by raising the wait to 120s it likely won't work. Look in cam_periph.c, in camperiphscsisenseerror(), in the SS_TUR/SSQ_MANY case. Increase the retry count to 240 and you'll get up to 240 test unit ready commands sent every half second. But, I think you may be right that the kernel may not be waiting. See below. I suspect the driver is broken. > > > On Shell: > > > [30]cicely19# dd if=/dev/da7 bs=1k count=1 of=/dev/null > > > 1+0 records in > > > 1+0 records out > > > 1024 bytes transferred in 0.008765 secs (116829 bytes/sec) > > > [31]cicely19# camcontrol stop -n da -u 7 > > > Unit stopped successfully > > > [32]cicely19# dd if=/dev/da7 bs=1k count=1 of=/dev/null > > > dd: /dev/da7: Input/output error > > > 0+0 records in > > > 0+0 records out > > > 0 bytes transferred in 0.004810 secs (0 bytes/sec) > > > Exit 1 > > > > What errors do you see on the console at that point? In order for CAM to > > automatically spin up the disk, it needs to send back 0x04,0x02 when it is > > spun down, and it needs to actually spin up the disk in response to a start > > unit. > > I don't see anything on console. That's strange. > > What happens when you: > > > > camcontrol stop da7 > > camcontrol tur da7 -v > > camcontrol start da7 -v > > [52]raven# camcontrol stop da7 > Unit stopped successfully > [53]raven# camcontrol tur da7 -v > Unit is not ready > (pass8:esp1:0:10:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 > (pass8:esp1:0:10:0): CAM Status: CCB request is in progress > Exit 1 Okay, that's wrong. The CCB status is never set properly, even though the command was completed. It looks like the driver may be broken. It should set the CAM status to CAM_SCSI_STATUS_ERROR in this case, but there is no place in the driver (that I can see) where it ever sets that status. > [54]raven# camcontrol start da7 -v > Unit started successfully Ken -- Kenneth Merry ken@FreeBSD.ORG