From owner-freebsd-scsi Mon Dec 28 20:37:29 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id UAA14713 for freebsd-scsi-outgoing; Mon, 28 Dec 1998 20:37:29 -0800 (PST) (envelope-from owner-freebsd-scsi@FreeBSD.ORG) Received: from panzer.plutotech.com (panzer.plutotech.com [206.168.67.125]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA14704 for ; Mon, 28 Dec 1998 20:37:19 -0800 (PST) (envelope-from ken@panzer.plutotech.com) Received: (from ken@localhost) by panzer.plutotech.com (8.9.1/8.8.5) id VAA30238; Mon, 28 Dec 1998 21:36:09 -0700 (MST) From: "Kenneth D. Merry" Message-Id: <199812290436.VAA30238@panzer.plutotech.com> Subject: Re: Unexpected busfree In-Reply-To: <9812281734.AA195313@pentium> from "Paul T. Haddad" at "Dec 28, 98 05:34:19 pm" To: paul@pth.com (Paul T. Haddad) Date: Mon, 28 Dec 1998 21:36:09 -0700 (MST) Cc: freebsd-scsi@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL28s (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Paul T. Haddad wrote... > Hi All, > > I keep getting the following messages (mostly under high load). Under sufficiently high load (make -j 8 world and a few more I/O intensive processes) the machine becomes unusable with these messages happening every few seconds. > > (da1:ahc0:0:1:0): SCB 0x5 - timed out while idle, LASTPHASE == 0x1, SEQADDR == 0 > x9 > (da1:ahc0:0:1:0): Queuing a BDR SCB > (da1:ahc0:0:1:0): Bus Device Reset Message Sent > (da1:ahc0:0:1:0): no longer in timeout, status = 353 > Unexpected busfree. LASTPHASE == 0xa0 > SEQADDR == 0x14f > > I know someone asked the same question very recently and the suggestion was to check the drives firmware. I haven't been able to find any info on how to update the firmware on my drives (Conner CFP2107E) and would really doubt there is any upgrades out there (though if anyone knows any different please let me know). > > Anyways this is a very recent kernel (< 1 week old), the dmesg output is listed below. This seems to happen more often when I have the drives mounted async and happens on both da1 which is a vinum subdisk and da0 which is my main drive (I haven't noticed the problem on da2, though it may happen there too). If you have any suggestions on how to get rid of these errors I would really appreciate it. > > --- You may want to hit return after 75 characters or so. I've got some ideas on why this may be happening, and what you can try to fix it. See below. [ ... ] > Timecounter "i8254" frequency 1193182 Hz > CPU: Pentium/P54C (586-class CPU) > Origin = "GenuineIntel" Id = 0x525 Stepping=5 > Features=0x3bf > real memory = 100663296 (98304K bytes) > config> quit > avail memory = 95240192 (93008K bytes) > Programming 16 pins in IOAPIC #0 > FreeBSD/SMP: Multiprocessor motherboard > cpu0 (BSP): apic id: 0, version: 0x00030010, at 0xfee00000 > cpu1 (AP): apic id: 2, version: 0x00030010, at 0xfee00000 > cpu2 (AP): apic id: 3, version: 0x00030010, at 0xfee00000 > cpu3 (AP): apic id: 4, version: 0x00030010, at 0xfee00000 > io0 (APIC): apic id: 14, version: 0x000f0011, at 0xfec00000 Interesting indeed. You don't see many quad processor systems, especially quad pentium systems. [ ... ] > da0 at ahc0 bus 0 target 0 lun 0 > da0: Fixed Direct Access SCSI-2 device > da0: 20.0MB/s transfers (10.0MHz, offset 8, 16bit), Tagged Queueing Enabled > da0: 2048MB (4194304 512 byte sectors: 64H 32S/T 2048C) > da2 at ahc1 bus 0 target 5 lun 0 > da2: Fixed Direct Access SCSI-2 device > da2: 20.0MB/s transfers (10.0MHz, offset 8, 16bit), Tagged Queueing Enabled > da2: 2048MB (4194304 512 byte sectors: 255H 63S/T 261C) > da1 at ahc0 bus 0 target 1 lun 0 > da1: Fixed Direct Access SCSI-2 device > da1: 20.0MB/s transfers (10.0MHz, offset 8, 16bit), Tagged Queueing Enabled > da1: 2048MB (4194304 512 byte sectors: 64H 32S/T 2048C) As you indicated above, since these are older Conner drives, it is unlikely that you'll be able to get updated firmware for them. [ ... ] > (da1:ahc0:0:1:0): tagged openings now 31 > (da2:ahc1:0:5:0): tagged openings now 32 [ ... ] > (da1:ahc0:0:1:0): SCB 0x5 - timed out while idle, LASTPHASE == 0x1, SEQADDR == 0x9 > (da1:ahc0:0:1:0): Queuing a BDR SCB > (da1:ahc0:0:1:0): Bus Device Reset Message Sent > (da1:ahc0:0:1:0): no longer in timeout, status = 353 The timed out while idle messages are generally caused when a device "goes out to lunch" and does not return a request before the timeout expires. In the case of the da driver, the read and write timeouts are 60 seconds. To wake the drive up, we hit it over the head with a BDR. This generally will get most devices up and running again. One possible explanation for your drive's behavior is that the firmware doesn't react well to the queue full condition. I would suggest limiting the maximum number of tags that we send to the drives, to avoid the queue full condition. If you want to be conservative about it, you can set the maximum at 24 or so, and see what happens. Keep in mind that this will reduce the number of I/O requests that can be queued to the drive at any one time. It might, however, fix your problem. Go into sys/cam/cam_xpt.c. You'll see a number of quirk entries, starting at around line 240. Woooh, wait a second. There's already a quirk entry for your drive! It says that drive is broken for tagged queueing! But the problem is that the removable flag is set: { /* Broken tagged queuing drive */ { T_DIRECT, SIP_MEDIA_REMOVABLE, "CONNER", "CFP2107*", "*" }, /*quirks*/0, /*mintags*/0, /*maxtags*/0 }, So, just change it to: { /* Broken tagged queuing drive */ { T_DIRECT, SIP_MEDIA_FIXED, "CONNER", "CFP2107*", "*" }, /*quirks*/0, /*mintags*/0, /*maxtags*/0 }, Hopefully your problem will go away. I won't be able to commit that fix until next week, so hopefully someone else can step in and do it. If not, I'll do it sometime next week. Until then, you can run with that change made locally, and see if it fixes your problem. > Unexpected busfree. LASTPHASE == 0xa0 > SEQADDR == 0x14f I can't remember exactly why these messages pop up. Justin knows, though. If he's around, maybe he'll explain. > (da0:ahc0:0:0:0): tagged openings now 31 Ken -- Kenneth Merry ken@plutotech.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message