Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Sep 2010 13:58:13 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        a.smith@ukgrid.net
Cc:        freebsd-fs@freebsd.org, Andriy Gapon <avg@icyb.net.ua>
Subject:   Re: ZFS related kernel panic
Message-ID:  <4C91F845.4010100@FreeBSD.org>
In-Reply-To: <4C8A7B20.7090408@FreeBSD.org>
References:  <20100909140000.5744370gkyqv4eo0@webmail2.ukgrid.net> <20100909182318.11133lqu4q4u1mw4@webmail2.ukgrid.net> <4C89D6A8.1080107@icyb.net.ua> <20100910143900.20382xl5bl6oo9as@webmail2.ukgrid.net> <20100910141127.GA13056@icarus.home.lan> <20100910155510.11831w104qjpyc4g@webmail2.ukgrid.net> <20100910152544.GA14636@icarus.home.lan> <20100910173912.205969tzhjiovf8c@webmail2.ukgrid.net> <4C8A6B26.8050305@icyb.net.ua> <20100910184921.16956kbaskhrsmg4@webmail2.ukgrid.net> <4C8A7B20.7090408@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------050703070900040004070005
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Alexander Motin wrote:
> It looks like during timeout handling (it is quite complicated process
> when port multiplier is used) some request was completed twice. So
> original problem is probably in hardware (try to check/replace cables,
> multiplier, ...), that caused timeout, but the fact that drive was
> unable to handle it is probably a siis(4) driver bug.

Thanks to console access provided, I have found the reason of crash.
Attached patch should fix it. Patched system successfully runs the
stress test for 45 minutes now, comparing to crashing in few minutes
without it.

Also I've found that timeouts reported by the driver are not fatal.
Affected commands are correctly completing as soon as after detecting
time out driver freezes new incoming requests to resolve situation, and
as result, idling the bus. ones. These timeouts I think caused by some
congestion on SATA interface, that probably caused by port multiplier.
This panic could be triggered only by such fake timeouts, not the real

-- 
Alexander Motin

--------------050703070900040004070005
Content-Type: text/plain;
 name="siis.c.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="siis.c.patch"

--- siis.c.debug	2010-09-16 11:11:59.000000000 +0100
+++ siis.c	2010-09-16 11:12:31.000000000 +0100
@@ -1209,6 +1209,7 @@ siis_end_transaction(struct siis_slot *s
 	device_t dev = slot->dev;
 	struct siis_channel *ch = device_get_softc(dev);
 	union ccb *ccb = slot->ccb;
+	int lastto;
 
 	mtx_assert(&ch->mtx, MA_OWNED);
 	bus_dmamap_sync(ch->dma.work_tag, ch->dma.work_map,
@@ -1292,11 +1293,6 @@ siis_end_transaction(struct siis_slot *s
 	ch->oslots &= ~(1 << slot->slot);
 	ch->rslots &= ~(1 << slot->slot);
 	ch->aslots &= ~(1 << slot->slot);
-	if (et != SIIS_ERR_TIMEOUT) {
-		if (ch->toslots == (1 << slot->slot))
-			xpt_release_simq(ch->sim, TRUE);
-		ch->toslots &= ~(1 << slot->slot);
-	}
 	slot->state = SIIS_SLOT_EMPTY;
 	slot->ccb = NULL;
 	/* Update channel stats. */
@@ -1305,6 +1301,13 @@ siis_end_transaction(struct siis_slot *s
 	    (ccb->ataio.cmd.flags & CAM_ATAIO_FPDMA)) {
 		ch->numtslots[ccb->ccb_h.target_id]--;
 	}
+	/* Cancel timeout state if request completed normally. */
+	if (et != SIIS_ERR_TIMEOUT) {
+		lastto = (ch->toslots == (1 << slot->slot));
+		ch->toslots &= ~(1 << slot->slot);
+		if (lastto)
+			xpt_release_simq(ch->sim, TRUE);
+	}
 	/* If it was our READ LOG command - process it. */
 	if (ch->readlog) {
 		siis_process_read_log(dev, ccb);

--------------050703070900040004070005--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C91F845.4010100>