From owner-freebsd-scsi  Mon Dec 28 20:37:29 1998
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id UAA14713
          for freebsd-scsi-outgoing; Mon, 28 Dec 1998 20:37:29 -0800 (PST)
          (envelope-from owner-freebsd-scsi@FreeBSD.ORG)
Received: from panzer.plutotech.com (panzer.plutotech.com [206.168.67.125])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA14704
          for <freebsd-scsi@FreeBSD.ORG>; Mon, 28 Dec 1998 20:37:19 -0800 (PST)
          (envelope-from ken@panzer.plutotech.com)
Received: (from ken@localhost)
          by panzer.plutotech.com (8.9.1/8.8.5) id VAA30238;
          Mon, 28 Dec 1998 21:36:09 -0700 (MST)
From: "Kenneth D. Merry" <ken@plutotech.com>
Message-Id: <199812290436.VAA30238@panzer.plutotech.com>
Subject: Re: Unexpected busfree
In-Reply-To: <9812281734.AA195313@pentium> from "Paul T. Haddad" at "Dec 28, 98 05:34:19 pm"
To: paul@pth.com (Paul T. Haddad)
Date: Mon, 28 Dec 1998 21:36:09 -0700 (MST)
Cc: freebsd-scsi@FreeBSD.ORG
X-Mailer: ELM [version 2.4ME+ PL28s (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-scsi@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Paul T. Haddad wrote...
> Hi All,
> 
> I keep getting the following messages (mostly under high load).  Under sufficiently high load (make -j 8 world and a few more I/O intensive processes) the machine becomes unusable with these messages happening every few seconds.
> 
> (da1:ahc0:0:1:0): SCB 0x5 - timed out while idle, LASTPHASE == 0x1, SEQADDR == 0
> x9
> (da1:ahc0:0:1:0): Queuing a BDR SCB
> (da1:ahc0:0:1:0): Bus Device Reset Message Sent
> (da1:ahc0:0:1:0): no longer in timeout, status = 353
> Unexpected busfree.  LASTPHASE == 0xa0
> SEQADDR == 0x14f
> 
> I know someone asked the same question very recently and the suggestion was to check the drives firmware.  I haven't been able to find any info on how to update the firmware on my drives (Conner CFP2107E) and would really doubt there is any upgrades out there (though if anyone knows any different please let me know).
> 
> Anyways this is a very recent kernel (< 1 week old), the dmesg output is listed below.  This seems to happen more often when I have the drives mounted async and happens on both da1 which is a vinum subdisk and da0 which is my main drive (I haven't noticed the problem on da2, though it may happen there too).  If you have any suggestions on how to get rid of these errors I would really appreciate it.
> 
> ---

You may want to hit return after 75 characters or so.

I've got some ideas on why this may be happening, and what you can try to
fix it.  See below.

[ ... ]

> Timecounter "i8254"  frequency 1193182 Hz
> CPU: Pentium/P54C (586-class CPU)
>   Origin = "GenuineIntel"  Id = 0x525  Stepping=5
>   Features=0x3bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,APIC>
> real memory  = 100663296 (98304K bytes)
> config> quit
> avail memory = 95240192 (93008K bytes)
> Programming 16 pins in IOAPIC #0
> FreeBSD/SMP: Multiprocessor motherboard
>  cpu0 (BSP): apic id:  0, version: 0x00030010, at 0xfee00000
>  cpu1 (AP):  apic id:  2, version: 0x00030010, at 0xfee00000
>  cpu2 (AP):  apic id:  3, version: 0x00030010, at 0xfee00000
>  cpu3 (AP):  apic id:  4, version: 0x00030010, at 0xfee00000
>  io0 (APIC): apic id: 14, version: 0x000f0011, at 0xfec00000

Interesting indeed.  You don't see many quad processor systems, especially
quad pentium systems.

[ ... ]

> da0 at ahc0 bus 0 target 0 lun 0
> da0: <CONNER CFP2107E  2.14GB 1524> Fixed Direct Access SCSI-2 device 
> da0: 20.0MB/s transfers (10.0MHz, offset 8, 16bit), Tagged Queueing Enabled
> da0: 2048MB (4194304 512 byte sectors: 64H 32S/T 2048C)
> da2 at ahc1 bus 0 target 5 lun 0
> da2: <CONNER CFP2107E  2.14GB 1524> Fixed Direct Access SCSI-2 device 
> da2: 20.0MB/s transfers (10.0MHz, offset 8, 16bit), Tagged Queueing Enabled
> da2: 2048MB (4194304 512 byte sectors: 255H 63S/T 261C)
> da1 at ahc0 bus 0 target 1 lun 0
> da1: <CONNER CFP2107E  2.14GB 1524> Fixed Direct Access SCSI-2 device 
> da1: 20.0MB/s transfers (10.0MHz, offset 8, 16bit), Tagged Queueing Enabled
> da1: 2048MB (4194304 512 byte sectors: 64H 32S/T 2048C)

As you indicated above, since these are older Conner drives, it is unlikely
that you'll be able to get updated firmware for them.

[ ... ]

> (da1:ahc0:0:1:0): tagged openings now 31
> (da2:ahc1:0:5:0): tagged openings now 32

[ ... ]

> (da1:ahc0:0:1:0): SCB 0x5 - timed out while idle, LASTPHASE == 0x1, SEQADDR == 0x9
> (da1:ahc0:0:1:0): Queuing a BDR SCB
> (da1:ahc0:0:1:0): Bus Device Reset Message Sent
> (da1:ahc0:0:1:0): no longer in timeout, status = 353

The timed out while idle messages are generally caused when a device "goes
out to lunch" and does not return a request before the timeout expires.  In
the case of the da driver, the read and write timeouts are 60 seconds.

To wake the drive up, we hit it over the head with a BDR.  This generally
will get most devices up and running again.

One possible explanation for your drive's behavior is that the firmware
doesn't react well to the queue full condition.

I would suggest limiting the maximum number of tags that we send to the
drives, to avoid the queue full condition.  If you want to be conservative
about it, you can set the maximum at 24 or so, and see what happens.  Keep
in mind that this will reduce the number of I/O requests that can be queued
to the drive at any one time.  It might, however, fix your problem.

Go into sys/cam/cam_xpt.c.  You'll see a number of quirk entries, starting
at around line 240.  Woooh, wait a second.  There's already a quirk entry
for your drive!  It says that drive is broken for tagged queueing!  But the
problem is that the removable flag is set:

	{
		/* Broken tagged queuing drive */ 
		{ T_DIRECT, SIP_MEDIA_REMOVABLE, "CONNER", "CFP2107*", "*" },
		/*quirks*/0, /*mintags*/0, /*maxtags*/0
	},

So, just change it to:

	{
		/* Broken tagged queuing drive */ 
		{ T_DIRECT, SIP_MEDIA_FIXED, "CONNER", "CFP2107*", "*" },
		/*quirks*/0, /*mintags*/0, /*maxtags*/0
	},

Hopefully your problem will go away.  I won't be able to commit that fix
until next week, so hopefully someone else can step in and do it.  If not,
I'll do it sometime next week.  Until then, you can run with that change
made locally, and see if it fixes your problem.

> Unexpected busfree.  LASTPHASE == 0xa0
> SEQADDR == 0x14f

I can't remember exactly why these messages pop up.  Justin knows, though.
If he's around, maybe he'll explain.

> (da0:ahc0:0:0:0): tagged openings now 31

Ken
-- 
Kenneth Merry
ken@plutotech.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message