Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Jun 1999 19:27:26 +0200 (CEST)
From:      Wilko Bulte <wilko@yedi.iaf.nl>
To:        ken@plutotech.com (Kenneth D. Merry)
Cc:        jgreco@ns.sol.net, scsi@FreeBSD.ORG
Subject:   Re: FreeBSD panics with Mylex DAC960SX
Message-ID:  <199906301727.TAA00581@yedi.iaf.nl>
In-Reply-To: <199906292300.RAA29666@panzer.kdm.org> from "Kenneth D. Merry" at "Jun 29, 1999  5: 0:50 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
As Kenneth D. Merry wrote ...
> Joe Greco wrote...
> > Hello,
> > 
> > First, cool stuff in 3.X!  Hats off to you guys.
> > 
> > I have one minor issue that I am hoping is a simple fix.
> > 
> > I'm using Mylex DAC960SX SCSI-to-SCSI RAID controllers on an ASUS P2B-DS
> > motherboard, off of the onboard SCSI controller.  This is a neat gadget
> > that makes a bunch of drives look like a single SCSI target.
> > 
> > Now...  here's the problem.  The unit takes a while to start up (~60s)
> > from power on, and until it reports "STARTUP COMPLETE", FreeBSD blows
> > chunks when trying to access it.
> > 
> > In particular, when the Mylex freaks out and thinks half its disks are
> > dead (duh forgot to power them on), the startup sequence never completes,
> > and FreeBSD will sit there doing boot-panic-boot-panic-etc.  This is not
> > very gracious, and is a bit irritating since the serial console I need to
> > talk to the Mylex is on the box...
> > 
> > So, my _real_ issue is the following panic:
> 
> [ ... ]
> 
> > da1 at ahc0 bus 0 target 1 lun 0
> > da1: <MYLEX DAC960SX138928B5 4332> Fixed Direct Access SCSI-2 device 
> > da1: 40.0MB/s transfers (20.0MHz, offset 16, 16bit), Tagged Queueing Enabled
> > da1: A
> > de0: autosense failed: cable problem?
> > swapon: adding /dev/da0s1b as swap device
> > Automatic reboot in progress...
> > /dev/rda0s1a: FILESYSTEM CLEAN^M; SKIPPING CHECK
> > S
> > ^M/dev/rda0s1a: 
> > clean, 138968 frFee (296 frags, 1a7334 blocks, 0.2t% fragmentation)a
> > l trap 18: integer divide fault while in kernel mode
> > mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
> > instruction pointer     = 0x8:0xf014a681
> > stack pointer           = 0x10:0xfa66b9d8
> > frame pointer           = 0x10:0xfa66ba00
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                         = DPL 0, pres 1, def32 1, gran 1
> > processor eflags        = interrupt enabled, resume, IOPL = 0
> > current process         = 18 (fsck)
> > interrupt mask          =  <- SMP: XXX
> > trap number             = 18
> > panic: integer divide fault
> > mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
> > boot() called on cpu#1
> > 
> > syncing disks... done
> > (da1:ahc0:0:1:0): SYNCHRONIZE CACHE. CDB: 35 0 0 0 0 0 0 0 0 0 
> > (da1:ahc0:0:1:0): NOT READY
> > Automatic reboot in 15 seconds - press a key on the console to abort
> > Rebooting...
> > cpu_reset called on cpu#1
> > cpu_reset: Stopping other CPUs
> > cpu_reset: Restarting BSP
> > cpu_reset_proxy: Grabbed mp lock for BSP
> > cpu_reset_proxy: Stopped CPU 1
> > 
> > I apologize for not reproducing this on a 3.2R box but I assure you that
> > it also panics in fsck on 3.2R in what appears to be an identical manner.
> > The panic does seem to be caused by fsck - I can enter single user mode
> > just fine.
> > 
> > My guess is that the integer divide fault results from the device reporting
> > a size of zero (strictly a guess though!).  Normally, size is reported as
> > 
> > da1: <MYLEX DAC960SX138928B5 4332> Fixed Direct Access SCSI-2 device 
> > da1: 40.0MB/s transfers (20.0MHz, offset 16, 16bit), Tagged Queueing Enabled
> > da1: 138928MB (284524544 512 byte sectors: 255H 63S/T 17710C)
> > 
> > but during all of these crash-boots, the third line is
> > 
> > da1: <MYLEX DAC960SX138928B5 4332> Fixed Direct Access SCSI-2 device 
> > da1: 40.0MB/s transfers (20.0MHz, offset 16, 16bit), Tagged Queueing Enabled
> > da1: A
> 
> That should probably read "Attempt to query device size failed ...."
> 
> You may be losing characters over the serial console or something.
> 
> > If I can provide further information to assist in tracking down this bug,
> > please let me know.
> 
> My first guess is that it's happening during the open() routine, for some
> reason.  That's why fsck seems to cause the problem.
> 
> You're probably right about the device returning a size of zero.  It isn't
> immediately clear to me why the open routine would cause a panic, *unless*
> the Mylex unit returns good status for the read capacity command, but
> returns a capacity of 0.

Although this definitely a bogus response I don't see the point in panic-ing
the machine. An offensive message on the console, by all means. A panic?

This remark assumes you are not booting from the raid of course :)

--
|   / o / /  _  	 Arnhem, The Netherlands	- Powered by FreeBSD -
|/|/ / / /( (_) Bulte 	 WWW  : http://www.tcja.nl 	http://www.freebsd.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199906301727.TAA00581>