Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Jun 1999 22:10:43 -0600 (MDT)
From:      "Kenneth D. Merry" <ken@plutotech.com>
To:        jgreco@ns.sol.net (Joe Greco)
Cc:        wilko@yedi.iaf.nl (Wilko Bulte), scsi@freebsd.org
Subject:   Re: FreeBSD panics with Mylex DAC960SX
Message-ID:  <199907010410.WAA39523@panzer.kdm.org>
In-Reply-To: <199906301909.OAA85863@aurora.sol.net> from Joe Greco at "Jun 30, 1999 02:09:12 pm"

next in thread | previous in thread | raw e-mail | index | archive | help

--ELM930802243-39452-0_
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Joe Greco wrote...
> > > You're probably right about the device returning a size of zero.  It isn't
> > > immediately clear to me why the open routine would cause a panic, *unless*
> > > the Mylex unit returns good status for the read capacity command, but
> > > returns a capacity of 0.
> > 
> > Although this definitely a bogus response I don't see the point in panic-ing
> > the machine. An offensive message on the console, by all means. A panic?
> > 
> > This remark assumes you are not booting from the raid of course :)
> 
> Couldn't boot from it 'til it was ready (which it isn't, which leads to this
> entire problem).
> 
> Okay, anyways, ddb output.  I really have no clue what I'm doing with the
> kernel debugger so if I did anything stupid and you need other data, let me
> know what to do.
> 
> I put the camcontrol statement and then a fsck -p into root's .profile so
> that it'd be a bit easier to manage this little show.
> 
> changing root device to dda0 at ahc0 bus 0 target 0 lun 0
> da0: <SEAGATE ST34371W 0484> Fixed Direct Access SCSI-2 device 
> da0: 40.0MB/s transfers (20.0MHz, offset 15, 16bit), Tagged Queueing Enabled
> da0: 4148MB (8496884 512 byte sectors: 255H 63S/T 528C)
> a0s1a
> da1 at ahc0 bus 0 target 1 lun 0
> da1: <MYLEX DAC960SX138928B5 4332> Fixed Direct Access SCSI-2 device 
> da1: 40.0MB/s transfers (20.0MHz, offset 16, 16bit), Tagged Queueing Enabled
> da1: A
> Enter full pathname of shell or RETURN for /bin/sh: 
> erase ^H, kill ^U, intr ^C
> /sbin/camcontrol cmd -n da -u 1 -v -c 25 0 0 0 0 0 0 0 0 0 -i 8 i4 i4
> camcontrol: error sending command
> (pass1:ahc0:0:1:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 
> (pass1:ahc0:0:1:0): NOT READY
> end of camcontrol

That's odd.  No ASC or ASCQ, just a sense key.  Most SCSI-2 devices will
give you an ASC and an ASCQ.  Even still, that in and of itself shouldn't
be enough to cause trouble for us...

[ ... ]

> db> tracede0: autosense failed: cable problem?
> /u
> dscheck(f67b3ae8,f182dd00) at dscheck+0xbb
> dastrategy(f67b3ae8,0,fa639a01,f181fc00,f181fccc) at dastrategy+0x56
> dsinit(f01f746b,20d0c,f01205d4,fa639c90,f181fccc) at dsinit+0x52
> dsopen(f01f746b,20d0c,2000,0,f181fccc) at dsopen+0x8e
> daopen(20d0c,1,2000,fa61b4c0,0) at daopen+0x2a1
> spec_open(fa639e2c,fa639e00,f01ae21d,fa639e2c,fa639ea0) at spec_open+0x161
> spec_vnoperate(fa639e2c,fa639ea0,f01712ca,fa639e2c,0) at spec_vnoperate+0x15
> ufs_vnoperatespec(fa639e2c,0,fa639f94,fa61b4c0,f016879e) at
> ufs_vnoperatespec+0x15
> vn_open(fa639f00,1,140,fa61b4c0,f021527c) at vn_open+0x3e2
> open(fa61b4c0,fa639f94,8097140,1,804ac68) at open+0xad
> syscall(27,27,804ac68,1,efbfdbb4) at syscall+0x187
> Xint0x80_syscall() at Xint0x80_syscall+0x4c
> db> 
> 
> Now, based on some trace printf's I sprinkled in dscheck, it looks to
> me like I get as far as
> 
> if (bp->b_bcount % ssp->dss_secsize)
> 	goto bad_bcount;
> 
> around line #191 of kern/subr_diskslice.c.  (you can see the "ckpt1c"
> interspersed with other output if you look carefully).  It does not
> hit the printf() right after that, so I am guessing that ssp->dss_secsize
> is probably zero.

Yeah, that would explain the divide by zero panic.

[ ... ]

> Ahh.  It doesn't crash now.
> 
> changing root device tda0 at ahc0 bus 0 target 0 lun 0
> da0: <SEAGATE ST34371W 0484> Fixed Direct Access SCSI-2 device 
> da0: 40.0MB/s transfers (20.0MHz, offset 15, 16bit), Tagged Queueing Enabled
> da0: 4148MB (8496884 512 byte sectors: 255H 63S/T 528C)
> o da0s1a
> da1 at ahc0 bus 0 target 1 lun 0
> da1: <MYLEX DAC960SX138928B5 4332> Fixed Direct Access SCSI-2 device 
> da1: 40.0MB/s transfers (20.0MHz, offset 16, 16bit), Tagged Queueing Enabled
> da1: A
> Enter full pathname of shell or RETURN for /bin/sh: 
> erase ^H, kill ^U, intr ^C
> /sbin/camcontrol cmd -n da -u 1 -v -c 25 0 0 0 0 0 0 0 0 0 -i 8 i4 i4
> camcontrol: error sending command
> (pass1:ahc0:0:1:0): READ CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 
> (pass1:ahc0:0:1:0): NOT READY
> end of camcontrol
> /dev/rda0s1a: cFILESYSTEM CLEANk; SKIPPING CHECKpS
> 2% fragmentationlean, 127256 f1ree (296 frags, c15870 blocks, 0.
>                 )
> Whoa!
> ssp->dss_first_bsd_slice=0
> ssp->dss_nslices=2
> ssp->dss_oflags=0
> ssp->dss_secmult=0
> ssp->dss_secshift=-1
> ssp->dss_secsize=0
> da1: error reading primary partition table reading fsbn 0
> Can't open /dev/rda1s1e: Input/output error
[ ... ]

> Now if I wait for just a bit,
> 
> # /sbin/camcontrol cmd -n da -u 1 -v -c "25 0 0 0 0 0 0 0 0 0" -i 8 "i4 i4"
> 284524543 512 
> 
> Okay, well, I don't know what the hell the correct fix is, but this
> will hopefully light a bulb in some SCSI guru's head.  The panic, I would
> think, has _got_ to be fixed.  If anyone has a great suggestion on how I
> can make this work properly, that's good too.

Well, Bruce's fix, as he hinted, just covers up the problem, and doesn't
really address it.  It's probably a good check to have, though.

There are several questions I have, which I hope can be answered with some
diagnostic patches I've appended.

1.  Why does the da1 announcement just print 'A' and not the rest of the
    line?

2.  Why does the camcontrol read capacity output indicate that the Mylex
    array is not ready, yet an open immediately after that seems to pass
    the read capacity by just fine?

3.  Assuming the read capacity is returned without an error, why does the
    Mylex return a bogus sector size at least?  (indicated by your
    diagnostic output from the slice code above)

Hopefully I can at least get a clue to the answers for 1 and 2 with the
patches appended.

So, Joe, could you:

- apply Bruce's patch (so you won't panic), or just keep the one you've got
- apply the attached patch to scsi_da.c
- boot with -v (boot kernel -v at the loader prompt)
- send the output from the boot

I'm rather confused by this, and I'd like to figure out what's going on.

Thanks,

Ken
-- 
Kenneth Merry
ken@plutotech.com

--ELM930802243-39452-0_
Content-Type: text/plain; charset=US-ASCII
Content-Disposition: attachment; filename=scsi_da.c.diagnostics.063099
Content-Description: scsi_da.c.diagnostics.063099
Content-Transfer-Encoding: 7bit

==== //depot/cam/sys/cam/scsi/scsi_da.c#101 - /a/ken/perforce/cam/sys/cam/scsi/scsi_da.c ====
*** /tmp/tmp.22962.0	Wed Jun 30 21:57:06 1999
--- /a/ken/perforce/cam/sys/cam/scsi/scsi_da.c	Wed Jun 30 21:56:28 1999
***************
*** 336,343 ****
--- 336,353 ----
  							 SF_RETRY_SELTO,
  					  &softc->device_stats);
  
+ 		xpt_print_path(periph->path);
+ 		printf("read capacity returned %d\n", error);
+ 		
+ 		scsi_sense_print(&ccb->csio);
+ 
+ 		xpt_print_path(periph->path);
+ 		printf("address = %d, length = %d\n",
+ 		       scsi_4btoul(rcap->addr), scsi_4btoul(rcap->length));
+ 
  		xpt_release_ccb(ccb);
  
+ 
  		if (error == 0) {
  			dasetgeom(periph, rcap);
  		}
***************
*** 1372,1378 ****
  				 * unit not supported" (0x25) error.
  				 */
  				if ((have_sense) && (asc != 0x25)
! 				 && (error_code == SSD_CURRENT_ERROR))
  					snprintf(announce_buf,
  					    sizeof(announce_buf),
  						"Attempt to query device "
--- 1382,1392 ----
  				 * unit not supported" (0x25) error.
  				 */
  				if ((have_sense) && (asc != 0x25)
! 				 && (error_code == SSD_CURRENT_ERROR)) {
! 					printf("got sense: "
! 					       "sense_key = %#x, asc = %#x, "
! 					       "ascq = %#x\n",
! 					       sense_key, asc, ascq);
  					snprintf(announce_buf,
  					    sizeof(announce_buf),
  						"Attempt to query device "
***************
*** 1380,1386 ****
  						scsi_sense_key_text[sense_key],
  						scsi_sense_desc(asc,ascq,
  								&cgd.inq_data));
! 				else { 
  					if (have_sense)
  						scsi_sense_print(
  							&done_ccb->csio);
--- 1394,1400 ----
  						scsi_sense_key_text[sense_key],
  						scsi_sense_desc(asc,ascq,
  								&cgd.inq_data));
! 				} else { 
  					if (have_sense)
  						scsi_sense_print(
  							&done_ccb->csio);
***************
*** 1403,1410 ****
--- 1417,1428 ----
  			}
  		}
  		free(rdcap, M_TEMP);
+ 		xpt_print_path(periph->path);
+ 		printf("about to print announcement\n");
  		if (announce_buf[0] != '\0')
  			xpt_announce_periph(periph, announce_buf);
+ 		xpt_print_path(periph->path);
+ 		printf("printed announcement\n");
  		softc->state = DA_STATE_NORMAL;		
  		/*
  		 * Since our peripheral may be invalidated by an error

--ELM930802243-39452-0_--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199907010410.WAA39523>