From owner-freebsd-stable@FreeBSD.ORG Sat Jan 20 17:08:16 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B2EA816A405 for ; Sat, 20 Jan 2007 17:08:16 +0000 (UTC) (envelope-from lists@qwirky.net) Received: from public.aci.on.ca (aci.on.ca [205.207.148.251]) by mx1.freebsd.org (Postfix) with ESMTP id 59D3813C469 for ; Sat, 20 Jan 2007 17:08:16 +0000 (UTC) (envelope-from lists@qwirky.net) Received: from (invalid client hostname: host address literal does not match remote client address)[127.0.0.1] (xtreme-156-171.dyn.aci.on.ca[69.17.156.171] port=1626) by public.aci.on.ca([205.207.148.252] port=25) via TCP with esmtp (6423 bytes) (sender: ) id for ; Sat, 20 Jan 2007 12:07:49 -0500 (EST) (Smail-3.2.0.122-Pre 2005-Nov-17 #1 built 2006-Feb-21) Message-ID: <45B24C73.3010807@qwirky.net> Date: Sat, 20 Jan 2007 12:08:03 -0500 From: Jeff Royle User-Agent: Thunderbird 1.5.0.9 (Windows/20061207) MIME-Version: 1.0 To: LI Xin References: <45B0D996.8070704@qwirky.net> <45B0F61A.8020507@qwirky.net> <45B0F758.70408@delphij.net> In-Reply-To: <45B0F758.70408@delphij.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Antivirus: avast! (VPS 0704-0, 18/01/2007), Outbound message X-Antivirus-Status: Clean Cc: freebsd-stable@freebsd.org Subject: Re: 6.2 Release - Adaptec 2130SLP driver?? issue - aac driver X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lists@qwirky.net List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Jan 2007 17:08:16 -0000 LI Xin wrote: > Jeff Royle wrote: >> Jeff Royle wrote: >>> I could use some advice on this issue I have had with my raid controller. >>> I am not really running much on the system yet, postfix, Pf + pflogd, >>> rlogind, ssh, bsnmp and ntpd. While I was just reading a file with >>> less the system stopped responding. I thought it was the network >>> interfaces but I was able to ping the interface. Once I plugged a >>> monitor into the system I saw this (roughly): >>> >>> AAC0: COMMAND TIMEOUT AFTER X number of seconds >>> >>> Not good :) >>> >>> Reset of the system resolved the issue and it booted fine. Since >>> the controller stopped responding nothing was recorded to my logs. >>> >>> Now I have to figure out how to prevent that from happening again. >>> >>> Basic run down on the system and some history... >>> >>> P4 3.2Ghz >>> Asus P5MT-S MB >>> 2 x 1GB DDR2 667 memory >>> Adaptec 2130SLP Raid Controller + battery backup module >>> 2 Segate Ultra320 73GB 15k RPM (mirrored) >>> >>> I have run this same system hardware testing 6.2-BETA3, RC-1 and RC-2 >>> without this issue. I was using the driver released by Adaptec >>> while testing the pre-release installs >>> (http://www.adaptec.com/en-US/speed/raid/aac/unix/aacraid_freebsd6_drv_b11518_tgz.htm). >>> You could say I am fairly confidient in the hardware itself. I have >>> put this system through a lot of testing since BETA3. >>> >>> The 6.2 release kernel has not been customized all that much, I just >>> pulled out all the drivers I would never use. To be safe I kept >>> just about all scsi devices/card models still in as I continued my >>> testing of 6.2 release. Right now I am going to try taking out aac and >>> aacp then try the driver I used in my previous tests. However, >>> since I have run a week without this issue it will be hard/impossible >>> tell if this did anything to resolve it...I almost want a crash on the >>> old driver :) >>> >>> So I need some advice... How best do I debug this issue? >>> >>> Thanks in advance for any direction you guys can offer me. >>> >>> Cheers, >>> >>> Jeff >>> >>> >> It appears the driver I was using in my pre-release testing is newer >> then the release driver. >> >> Stock driver in 6.2r dmesg: >> >> aac0: mem >> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 >> aac0: New comm. interface enabled >> aac0: Adaptec Raid Controller 2.0.0-1 >> aacp0: on aac0 >> >> Currently using: >> >> aacu0: mem >> 0xfc600000-0xfc7fffff,0xfc5ff000-0xfc5fffff irq 24 at device 1.0 on pci2 >> aacu0: New comm. interface enabled >> aacu0: Adaptec Raid Controller 2.0.7-1 >> aacpu0: on aacu0 >> >> Going to continue testing with the newer driver. > > I have some preliminary work on merging the Adaptec driver: > > http://people.freebsd.org/~delphij/for_review/patch-aac-vendor-b11518 > > But one of the reviewers has advised me to request boarder testing, > especially against old cards and CLI tools, so I have hold the commit > for now. > > Cheers, Well the driver patched fine, no issues to report there. The speed performance is where I expected to see it while using bonnie and simple DD tests based on my previous testing. So far the issue I noted above with the TIMEOUT error has not shown itself again, time will tell I think on this one. However I have encountered a intermittent bug on boot. Sometimes, say every 5-10 boots the system will hang while probing the the scsi bus for the drives. Now I have seen this happen on the aacdu 2.0.7-1 binary driver I was using in my 6.2-RC 1 / 6.2-RC 2 testing once before. This problem is happening a fair bit more. Here is where it hangs... Hung dmesg output: -- snip --- orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xcd7ff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: parallel port not found. Timecounters tick every 1.000 msec acd0: CDRW at ata0-master UDMA33 aacd0: on aac0 aacd0: 69889MB (143132672 sectors) --- end snip --- The system does not continue on and probe the drives, as seen in a normal boot dmesg: --- snip --- sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: parallel port not found. Timecounters tick every 1.000 msec acd0: CDRW at ata0-master UDMA33 aacd0: on aac0 aacd0: 69889MB (143132672 sectors) pass0 at aacp0 bus 0 target 0 lun 0 pass0: Fixed unknown SCSI-3 device pass0: 3.300MB/s transfers pass1 at aacp0 bus 0 target 3 lun 0 pass1: Fixed unknown SCSI-3 device pass1: 3.300MB/s transfers SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/aacd0s1a -- end snip -- In a effort to resolve this I increased the scsi delay in the kernel from 5ms to 10ms options SCSI_DELAY=10000 It *may* have helped on one of my reboot tests, I thought it was going to hang again but proceeded. However it definitely did not solve the issue. Once I am back in the office I will see if I can get some debug output for you. Cheers, Jeff