Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 May 1998 03:08:21 -0400
From:      tcobb <tcobb@staff.circle.net>
To:        "'shimon@simon-shapiro.org'" <shimon@simon-shapiro.org>
Cc:        "freebsd-scsi@freebsd.org" <freebsd-scsi@FreeBSD.ORG>, "freebsd-current@freebsd.org" <freebsd-current@FreeBSD.ORG>
Subject:   DPT Redux
Message-ID:  <509A2986E5C5D111B7DD0060082F32A402FAE8@freya.circle.net>

next in thread | raw e-mail | index | archive | help
I won't respond to each of Simon's many emails over the past 24 hours,
simply because most of them were out-of-context reactions to a thread
that grew from my original DPT post yesterday.

Instead, I think that the most productive thing is to provide a bit more
of the information requested.

The system is using a single PM3334UW/2 with drives configured in the
following logical arrays:

2 1GB drives as RAID-1	(sd0)
7 4GB drives as RAID-5  (sd1)
1 4GB hot swap 

Event #1:
1 of the RAID-5 drives fails, DPT hardware begins to auto-rebuild with
the hot swap drive
DPT driver freezes access to sd1, system remains running but access to
sd1 hangs

I shutdown and rebooted machine  (SYNC failed on shutdown)
Allowed FreeBSD to boot, it returned the following for sd1
sd1: <DPT RAID-5 07M0> type 0 fixed SCSI 2
sd1: Direct-Access 0MB (1 512 byte sectors)

Then, system continued booting and finally panic'd with a "Page Fault in
Supervisor Mode" error prior to mounting drives.

I then booted the system with a DOS floppy, used DPTmgr to examine
array.  The array was complete, but in degraded mode.  It had begun
rebuilding itself, which specs say can happen in the background while
other accesses are going on.  I tested redundancy info on the array AND
tested random reads on the array -- all succeeded.  

So, I exited DPTmgr, and tried booting back to FreeBSD, same problem as
above occurred (0MB 1 sector, panic).  Then, I rebooted into DOS and let
the DPT card run its rebuild from there.  It completed about 1.5 hours
later, and showed the array optimal. 

I then rebooted into FreeBSD which showed the correct info again.

Event #2:
This was the next day.  Hard drive fails in array (this was the ex-hot
swap from above).  This leaves the array with no hotswap to insert, but
no data lost.  The array is now again in degraded mode.  The card
screams bloody murder.  HOWEVER, the DPT driver does NOT hang on access
to the sd1 partition.  I successfully shutdown the machine (SYNC
succeeded this time).  I insert a new harddrive into the array so that
the DPT hardware will begin rebuilding with this new drive.  On reboot,
FreeBSD showed the same results as above (0MB, 1 sector, panic).
Rebooting back to DOS and running DPTmgr showed that the array was in
degraded mode, but that no data was lost and that redundancy information
was all there.  It automatically began rebuilding with the new drive.  I
tested rebooting into FreeBSD, same results (0/1/panic).  Rebooted back
to DOS, allowed the hardware to finish its rebuild (1.5 hours), rebooted
to FreeBSD and it showed the correct results.


So, here's the summary for those of you who've stayed with me.

With RAID-5 and a HOT SWAP drive, a single drive failure caused the DPT
driver in FreeBSD to hang on access to the partition.  This appears to
be because DPT was doing a background rebuild automatically.

With RAID-5 and NO hot swap drive, a single drive failure does NOT cause
the DPT driver in FreeBSD to hang on access to the partition.  This
appears to be because DPT was NOT doing a background rebuild -- there
being no drives to rebuild into.

With RAID-5 and a new drive to rebuild on, the DPT hardware begins
automatic rebuilds of the array.  However, in these conditions the DPT
driver (or other FreeBSD component) does not correctly sense the size
information and panics the kernel during bootup.  This symptom goes away
after the rebuild is complete.  This symptom does not appear when in DOS
under the same circumstances.  DOS DPTmgr checks show the array of the
correct size.  BIOS bootup screen for DPT shows the array of the correct
size. 

The super-summary is that it appears the the DPT driver or other FreeBSD
code component is not correctly coordinating with the DPT hardware (or
sensing status properly) when the DPT hardware is doing a background
rebuild of the array.

This array has been running non-stop since November 1997.  Cabling is
good.  Active terminators and custom cables created by Granite are used.
Seagate and Micropolis drives are used.  The RAID-5 array is in an
external rackmount case.

-Troy Cobb
 Circle Net, Inc.
 http://www.circle.net


Here's the dmesg ouput, trimmed to show relevant data.

FreeBSD 3.0-CURRENT #0: Sun May 24 04:30:04 EDT 1998
    root@kali.circle.net:/usr/src/sys/compile/BENZAITEN-4
CPU: Pentium (232.67-MHz 586-class CPU)
  Origin = "GenuineIntel"  Id = 0x543  Stepping=3
  Features=0x8001bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,MMX>
real memory  = 134217728 (131072K bytes)
avail memory = 128147456 (125144K bytes)
DEVFS: ready for devices
DPT:  RAID Manager driver, Version 1.0.5
Probing for devices on PCI bus 0:
DPT:  PCI SCSI HBA Driver, version 1.4.2
chip0: <Intel 82437VX PCI cache memory controller> rev 0x02 on pci0.0.0
chip1: <Intel 82371SB PCI to ISA bridge> rev 0x01 on pci0.7.0
dpt0: <DPT Caching SCSI RAID Controller> rev 0x02 int a irq 9 on
pci0.20.0
dpt0: DPT type 3, model PM3334UW firmware 07M0, Protocol 0 
      on port 6310 with Write-Back cache.  LED = 0000 0000 
dpt0: Enabled Options:
      Recover Lost Interrupts
      Collect Metrics
      Optimize CPU Cache
dpt0: waiting for scsi devices to settle
scbus0 at dpt0 bus 0
dpt0: Initializing Lost IRQ Timer
sd0 at scbus0 target 0 lun 0
sd0: <DPT RAID-1 07M0> type 0 fixed SCSI 2
sd0: Direct-Access 1029MB (2109328 512 byte sectors)
dpt0: waiting for scsi devices to settle
scbus1 at dpt0 bus 1
sd1 at scbus1 target 2 lun 0
sd1: <DPT RAID-5 07M0> type 0 fixed SCSI 2
sd1: Direct-Access 20503MB (41990720 512 byte sectors)

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?509A2986E5C5D111B7DD0060082F32A402FAE8>