From owner-freebsd-scsi Mon Feb 5 04:56:52 1996 Return-Path: owner-freebsd-scsi Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id EAA25012 for freebsd-scsi-outgoing; Mon, 5 Feb 1996 04:56:52 -0800 (PST) Received: from bunyip.cc.uq.oz.au (bunyip.cc.uq.oz.au [130.102.2.1]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id EAA25006 for ; Mon, 5 Feb 1996 04:56:48 -0800 (PST) Received: from cc.uq.oz.au by bunyip.cc.uq.oz.au id <08126-0@bunyip.cc.uq.oz.au>; Mon, 5 Feb 1996 22:18:02 +1000 Received: from orion.devetir.qld.gov.au by pandora.devetir.qld.gov.au (8.6.10/DEVETIR-E0.3a) with ESMTP id WAA08895 for ; Mon, 5 Feb 1996 22:18:52 +1000 Received: from localhost by orion.devetir.qld.gov.au (8.6.10/DEVETIR-0.3) id WAA04227; Mon, 5 Feb 1996 22:12:42 +1000 Message-Id: <199602051212.WAA04227@orion.devetir.qld.gov.au> To: freebsd-scsi@freebsd.org cc: syssgm@devetir.qld.gov.au Subject: aha1542 MBO problem in 2.0.5 Date: Mon, 05 Feb 1996 22:12:41 +1000 From: Stephen McKay Sender: owner-freebsd-scsi@freebsd.org Precedence: bulk Since I revamped my machine (16->24Mb ram, DX33->DX4/100, +CD-ROM), I have had various SCSI problems. I still run 2.0.5 (because I'm using the PC too much to upgrade yet), and use a BT545S SCSI card to run the disk, tape and CDROM. I can't access my Archive 2525 at all using the bt driver, or I get crashes and reboots (bounce buffer problem, maybe). So, I use the aha driver. Unfortunately, I get lots of messages when accessing the disk and the CDROM simultaneously, like: aha0: MBO 01 and not 00 (free) sd0(aha0:0:0): timed out The timeouts are not necessarily paired with complaints about MBO. Now, I have a wild theory about the MBO not free problem. :-) Outgoing mailboxes are paired up with ccb's pretty early on in the aha driver. Thereafter, ccb's are allocated, and mailboxes just come with them. The most recently freed ccb is the next to be allocated, so when the system is busy, it is highly likely that a ccb will be reused immediately. This implies that the outgoing mailbox will be quickly reused. The manual with my BT545S proudly proclaims its multi tasking nature, so perhaps if it gets really busy, it might postpone marking the mailbox as read, especially since mailboxes are supposed to be used in a round robin manner, and there are bound to be a few still free. So, the scenario I am postulating is: host - allocate and set up ccb host - mark mailbox as active bt545 - read mailbox bt545 - read ccb, do the work, mark ccb done bt545 - interrupt host bt545 - (become really busy and defer updating mailbox) host - reallocate same ccb host - complain about mailbox still marked busy host - set up ccb host - mark mailbox as active bt545 - (finish being busy) bt545 - mark mailbox as free host - timeout (because bt545 ignored the mailbox) To combat this, I've changed the ccb allocation policy to reuse the oldest rather than the newest free ccb, in the expectation that this would access mailboxes almost round robin. I applied the patch given below, and thrashed the disk, tape and cdrom simultaneously (doing tar's and wc of big files in a loop) without any failures or errors logged. I reverted to the previous kernel and MBO not free errors turned up almost immediately. Then I got a couple of "pid 301: sh: uid 0: exited on signal 11" type messages and hurredly terminated the experiment. I'm back on the patched kernel and abusing it as I type. So, it appears that treating one's outgoing mailboxes in the official round robin manner is not optional. I intend to add proper round robin code myself soon, but I'm realistic enough about my erratic spare time to invite others to beat me to it. Anyway, here's my patch against 2.0.5 (but -current doesn't LOOK much different): Patch relative to "aha1542.c,v 1.45 1995/05/30 08:01:05 rgrimes Exp" --- aha1542.c Tue May 30 18:01:05 1995 +++ aha1542.sgm.c Sun Feb 4 21:26:02 1996 @@ -302,6 +302,7 @@ long int kv_phys_xor; struct aha_mbx aha_mbx; /* all the mailboxes */ struct aha_ccb *aha_ccb_free; /* the next free ccb */ + struct aha_ccb *aha_ccb_tail; /* end of the free ccb list */ struct aha_ccb aha_ccb[AHA_MBX_SIZE]; /* all the CCBs */ int aha_int; /* irq level */ int aha_dma; /* DMA req channel */ @@ -782,14 +783,20 @@ if (!(flags & SCSI_NOMASK)) opri = splbio(); - ccb->next = aha->aha_ccb_free; - aha->aha_ccb_free = ccb; ccb->flags = CCB_FREE; + + ccb->next = NULL; + if (aha->aha_ccb_free == NULL) + aha->aha_ccb_free = ccb; + else + aha->aha_ccb_tail->next = ccb; + aha->aha_ccb_tail = ccb; + /* * If there were none, wake anybody waiting for * one to come free, starting with queued entries */ - if (!ccb->next) { + if (aha->aha_ccb_free == aha->aha_ccb_tail) { wakeup((caddr_t)&aha->aha_ccb_free); } if (!(flags & SCSI_NOMASK)) @@ -819,6 +826,8 @@ } if (rc) { aha->aha_ccb_free = aha->aha_ccb_free->next; + if (aha->aha_ccb_free == NULL) + aha->aha_ccb_tail = NULL; /* Unnecessary, but neat. */ rc->flags = CCB_ACTIVE; } if (!(flags & SCSI_NOMASK)) @@ -1214,6 +1223,7 @@ * into a free-list * this is a kludge but it works */ + aha->aha_ccb_tail = &aha->aha_ccb[0]; for (i = 0; i < AHA_MBX_SIZE; i++) { aha->aha_ccb[i].next = aha->aha_ccb_free; aha->aha_ccb_free = &aha->aha_ccb[i]; @@ -1354,9 +1364,13 @@ xs->error = XS_DRIVER_STUFFUP; return (TRY_AGAIN_LATER); } - if (ccb->mbx->cmd != AHA_MBO_FREE) + if (ccb->mbx->cmd != AHA_MBO_FREE) { printf("aha%d: MBO %02x and not %02x (free)\n", - unit, ccb->mbx->cmd, AHA_MBO_FREE); + unit, ccb->mbx->cmd, AHA_MBO_FREE); + aha_free_ccb(unit, ccb, flags); + xs->error = XS_DRIVER_STUFFUP; + return (TRY_AGAIN_LATER); + } /* * Put all the arguments for the xfer in the ccb Stephen McKay.