From owner-freebsd-current Sat May 30 00:10:31 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id AAA06328 for freebsd-current-outgoing; Sat, 30 May 1998 00:10:31 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from freya.circle.net (freya.circle.net [209.95.95.2]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id AAA05973; Sat, 30 May 1998 00:09:02 -0700 (PDT) (envelope-from tcobb@staff.circle.net) Received: by freya.circle.net with Internet Mail Service (5.5.1960.3) id ; Sat, 30 May 1998 03:08:26 -0400 Message-ID: <509A2986E5C5D111B7DD0060082F32A402FAE8@freya.circle.net> From: tcobb To: "'shimon@simon-shapiro.org'" Cc: "freebsd-scsi@freebsd.org" , "freebsd-current@freebsd.org" Subject: DPT Redux Date: Sat, 30 May 1998 03:08:21 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.1960.3) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I won't respond to each of Simon's many emails over the past 24 hours, simply because most of them were out-of-context reactions to a thread that grew from my original DPT post yesterday. Instead, I think that the most productive thing is to provide a bit more of the information requested. The system is using a single PM3334UW/2 with drives configured in the following logical arrays: 2 1GB drives as RAID-1 (sd0) 7 4GB drives as RAID-5 (sd1) 1 4GB hot swap Event #1: 1 of the RAID-5 drives fails, DPT hardware begins to auto-rebuild with the hot swap drive DPT driver freezes access to sd1, system remains running but access to sd1 hangs I shutdown and rebooted machine (SYNC failed on shutdown) Allowed FreeBSD to boot, it returned the following for sd1 sd1: type 0 fixed SCSI 2 sd1: Direct-Access 0MB (1 512 byte sectors) Then, system continued booting and finally panic'd with a "Page Fault in Supervisor Mode" error prior to mounting drives. I then booted the system with a DOS floppy, used DPTmgr to examine array. The array was complete, but in degraded mode. It had begun rebuilding itself, which specs say can happen in the background while other accesses are going on. I tested redundancy info on the array AND tested random reads on the array -- all succeeded. So, I exited DPTmgr, and tried booting back to FreeBSD, same problem as above occurred (0MB 1 sector, panic). Then, I rebooted into DOS and let the DPT card run its rebuild from there. It completed about 1.5 hours later, and showed the array optimal. I then rebooted into FreeBSD which showed the correct info again. Event #2: This was the next day. Hard drive fails in array (this was the ex-hot swap from above). This leaves the array with no hotswap to insert, but no data lost. The array is now again in degraded mode. The card screams bloody murder. HOWEVER, the DPT driver does NOT hang on access to the sd1 partition. I successfully shutdown the machine (SYNC succeeded this time). I insert a new harddrive into the array so that the DPT hardware will begin rebuilding with this new drive. On reboot, FreeBSD showed the same results as above (0MB, 1 sector, panic). Rebooting back to DOS and running DPTmgr showed that the array was in degraded mode, but that no data was lost and that redundancy information was all there. It automatically began rebuilding with the new drive. I tested rebooting into FreeBSD, same results (0/1/panic). Rebooted back to DOS, allowed the hardware to finish its rebuild (1.5 hours), rebooted to FreeBSD and it showed the correct results. So, here's the summary for those of you who've stayed with me. With RAID-5 and a HOT SWAP drive, a single drive failure caused the DPT driver in FreeBSD to hang on access to the partition. This appears to be because DPT was doing a background rebuild automatically. With RAID-5 and NO hot swap drive, a single drive failure does NOT cause the DPT driver in FreeBSD to hang on access to the partition. This appears to be because DPT was NOT doing a background rebuild -- there being no drives to rebuild into. With RAID-5 and a new drive to rebuild on, the DPT hardware begins automatic rebuilds of the array. However, in these conditions the DPT driver (or other FreeBSD component) does not correctly sense the size information and panics the kernel during bootup. This symptom goes away after the rebuild is complete. This symptom does not appear when in DOS under the same circumstances. DOS DPTmgr checks show the array of the correct size. BIOS bootup screen for DPT shows the array of the correct size. The super-summary is that it appears the the DPT driver or other FreeBSD code component is not correctly coordinating with the DPT hardware (or sensing status properly) when the DPT hardware is doing a background rebuild of the array. This array has been running non-stop since November 1997. Cabling is good. Active terminators and custom cables created by Granite are used. Seagate and Micropolis drives are used. The RAID-5 array is in an external rackmount case. -Troy Cobb Circle Net, Inc. http://www.circle.net Here's the dmesg ouput, trimmed to show relevant data. FreeBSD 3.0-CURRENT #0: Sun May 24 04:30:04 EDT 1998 root@kali.circle.net:/usr/src/sys/compile/BENZAITEN-4 CPU: Pentium (232.67-MHz 586-class CPU) Origin = "GenuineIntel" Id = 0x543 Stepping=3 Features=0x8001bf real memory = 134217728 (131072K bytes) avail memory = 128147456 (125144K bytes) DEVFS: ready for devices DPT: RAID Manager driver, Version 1.0.5 Probing for devices on PCI bus 0: DPT: PCI SCSI HBA Driver, version 1.4.2 chip0: rev 0x02 on pci0.0.0 chip1: rev 0x01 on pci0.7.0 dpt0: rev 0x02 int a irq 9 on pci0.20.0 dpt0: DPT type 3, model PM3334UW firmware 07M0, Protocol 0 on port 6310 with Write-Back cache. LED = 0000 0000 dpt0: Enabled Options: Recover Lost Interrupts Collect Metrics Optimize CPU Cache dpt0: waiting for scsi devices to settle scbus0 at dpt0 bus 0 dpt0: Initializing Lost IRQ Timer sd0 at scbus0 target 0 lun 0 sd0: type 0 fixed SCSI 2 sd0: Direct-Access 1029MB (2109328 512 byte sectors) dpt0: waiting for scsi devices to settle scbus1 at dpt0 bus 1 sd1 at scbus1 target 2 lun 0 sd1: type 0 fixed SCSI 2 sd1: Direct-Access 20503MB (41990720 512 byte sectors) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message