From owner-freebsd-scsi@FreeBSD.ORG  Fri Jul  5 08:30:01 2013
Return-Path: <owner-freebsd-scsi@FreeBSD.ORG>
Delivered-To: freebsd-scsi@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id CC229859
 for <freebsd-scsi@smarthost.ysv.freebsd.org>;
 Fri,  5 Jul 2013 08:30:01 +0000 (UTC)
 (envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 by mx1.freebsd.org (Postfix) with ESMTP id A4E9C1B0C
 for <freebsd-scsi@smarthost.ysv.freebsd.org>;
 Fri,  5 Jul 2013 08:30:01 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id r658U1WQ013654
 for <freebsd-scsi@freefall.freebsd.org>; Fri, 5 Jul 2013 08:30:01 GMT
 (envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id r658U1X4013653;
 Fri, 5 Jul 2013 08:30:01 GMT (envelope-from gnats)
Date: Fri, 5 Jul 2013 08:30:01 GMT
Message-Id: <201307050830.r658U1X4013653@freefall.freebsd.org>
To: freebsd-scsi@FreeBSD.org
Cc: 
From: Markus Gebert <markus.gebert@hostpoint.ch>
Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and
 HP Bl Gen7 + Storage Blade)
X-BeenThere: freebsd-scsi@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: Markus Gebert <markus.gebert@hostpoint.ch>
List-Id: SCSI subsystem <freebsd-scsi.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-scsi>
List-Post: <mailto:freebsd-scsi@freebsd.org>
List-Help: <mailto:freebsd-scsi-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-scsi>,
 <mailto:freebsd-scsi-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Jul 2013 08:30:01 -0000

The following reply was made to PR kern/179932; it has been noted by GNATS.

From: Markus Gebert <markus.gebert@hostpoint.ch>
To: bug-followup@FreeBSD.org,
 =?iso-8859-1?Q?Philipp_M=E4chler?= <philipp.maechler@hostpoint.ch>,
 "sean_bruno@yahoo.com" <sean_bruno@yahoo.com>
Cc:  
Subject: Re: kern/179932: [ciss] ciss i/o stall problem with HP Bl Gen8 (and HP Bl Gen7 + Storage Blade)
Date: Fri, 5 Jul 2013 10:19:58 +0200

 Hey Sean
 
 I'm glad to hear you're getting the same controller as ours to test. In =
 the meantime it seems that the backported ciss changes from head seem to =
 help a lot on the G8 blades with the p220 controllers. It's quite likely =
 that the G8 problem is already fixed in head. Of course, we can't be =
 sure yet, but still it might be better to focus on the G7 with p410 and =
 storage blade, where the issue has occured even with ciss from head. So =
 it's good your getting a p410.
 
 We discussed your test scenario. ZFS is known to go nuts and do really =
 much IO once a zpool get quite full, so is your goal just to maximise IO =
 to reproduce the problem more reliably? Or is there a specific reason =
 why you want us to fill a zpool?
 
 Our problem is that half of the G7 blades are productive, so filling the =
 zpool is no option there. The second half is where the first half =
 replicates all data to, so they're kind of hot standby and we're more =
 flexibel doing tests there, but we still have to keep the replication =
 running, which makes filling the pool impossible as well.
 
 The day before yesterday we installed the patched kernel that has ciss =
 from head and CISS_DEBUG defined on all these standby systems. We run =
 zpool scrubs non-stop on all of them to generate IO and as they are =
 replication targets, they also receive some amount of write IO. Like =
 that, we hope to get a system to stall more often, so we can progress =
 more quickly debugging the G7 problem. If you think that more write IO =
 would help, we can look into using iozone, but a stated before, we won't =
 be able to do things like filling the zpool.
 
 Also, once a G7 blade stalls, is there any information apart from =
 alltrace and DDB ciss debug print you want as to pull out of the system?
 
 When reading through the ciss driver source I noticed that the DDB print =
 may only outpout information about the first controller. Since the =
 storage blade contains a second p410, do you think it'd be worth to =
 alter the debug function to print out information about any ciss =
 controller in the system?
 
 
 Markus